20 May'24 - AI insights neatly distilled

Shobhit Varshney
May 20, 2024

The past two weeks in the #GenAI world were 🔥🔥🔥, with massive announcements from IBM, OpenAI, Google, and more. I'm pumped for IBM THINK, Microsoft Build, and Dell Tech World this week! Quick reacap as of 20th May morning.

but first.. 🥃Whiskey pairing suggestion.. Yamazaki 18 by Suntory (before it got crazy expensive). In 2015-16, I spent a year in Japan and fell in love with Suntory distillery. Great infographic here on how Japanese whiskey prices have increased over time since they smashed a few awards. Fun fact #1 - The name "Suntory" is derived from the founder's name, ‘Torii san’ spelled backward. #2 - Suntory sent samples to the Intl Space Station to age whiskey in zero-gravity conditions. In 2018, they ran out of aged barrels since 15-25yrs back they hadn’t anticipated crazy spike in demand. If you see an aged Yamazaki, jump on it.

IBM

The Aha!

IBM had multiple mic-drop releases and fully embracing open-source. Easily our biggest GenAI week to mark watsonx’s 1st yr bday!
IBM open-sourced its flagship Granite models (details in my post here, Paper here, Blog here). It crushed the competition in its model weight class. This is huge, since enterprises can freely adapt the models, run them on-prem/any-cloud and build their own competitive advantage vs. outsourcing their AI.
https://huggingface.co/ibm-granite
Red Hat and IBM Research released an open-source project InstructLab, a novel mechanism to reduce the barrier to create synthetic training data, and enable the community to contribute to base model with knowledge and skills. This is the first time we see a scalable method for crowdsourcing improvements to a LLM. Details in my post here, keynote here and paper here.
IBM takes first place on the large-scale text-to-SQL BIRD-SQL leaderboard with ExSL + Granite-20b-Code, beating out a host of GPT-4-based solutions. David’s post here.
IBM launched a new product watsonx BI Assistant at Gartner conference as an AI-powered BI advisor. Alvin’s post here, with more updates at THINK this week.

Uh-Oh!

IBM has chosen to lead in the open-source, small trusted model, enterprise space. Competitors with massive multi-modal models and B2C scale get to monetize bigger models, trickle down their learnings about reasoning and efficiency to the smaller models. Open-sourcing granite and InstructLab should accelerate community innovation.
To double down on enterprise usecases, we need to focus more on industry and domain-specific small models. We are accelerating that with InstructLab.
Need some more push on pre-built evaluations, end2end pre-configured flows and LLM Agents. More to come soon.

OpenAI

The Aha !

OpenAI released its latest model GPT-4o, their new flagship model that can reason across audio, vision, and text in real time. The ‘o’ stands for omni, but I think they should have called it GPT-4OMG : ) It’s that good ! Great work prodigy Praffula & team.
More details on OpenAI announcement at my post here, and here is a great comparison of price, performance, speed vs. other LLMs
Reasoning with realtime voice/video/image, it unlocks great experiences that haven’t been possible before:
- Helping a kid with math homework [here] would democratize education and give every child access to the best educators
- A blind person using 4o’s vision reasoning to interact with the world [here] is life changing
- AI calling another AI in a call center to collaborate on solving an issue is the future of customer care [here]. I recently saw a dating site where a GPT agent takes on your personality and does virtual dates with other users’ GPT agents to find you a great match !
- AI running a meeting [here] and remembers everything.
- My enterprise clients will appreciate taking in a 10 min video of a warehouse or planogram in a store and instantly making sense of it. Or assisting a mechanic with real-time visual diagnostics etc.
  More demos at https://openai.com/index/hello-gpt-4o/

There are a ton of capabilities that OpenAI didn’t mention in the announcement that are worth digger deeper into. Particularly how well it does text inside images, character consistency across images etc.
GPT-4o (disguised as gpt2-chatbot and im-a-good-gpt2-chatbot) smoked the LMSYS leaderboard, and is the new king. Rooting for open Llama3 400B once it’s done training.
#1 on LMSYS https://chat.lmsys.org/?leaderboard
On traditional benchmarks, GPT-4o achieves GPT-4 Turbo-level performance on text, reasoning, and coding intelligence, while setting new high watermarks on multilingual, audio, and vision capabilities. I am seeing increased accuracy with RAG and complex logic.
The Mac desktop version has been fun. Interesting that despite all the $s from Microsoft, they released only on Mac. This directly competes with Windows co-pilot, so assuming Satya will announce at build.

The Uh-Oh!

OpenAI dissolved the entire ‘super alignment’ team that was focused on existential dangers of AI, and governing super intelligence. This came with departure of co-founder Ilya Sutskever, Jan Leike & team. Shiny updates vs.
OpenAI had to pull back its Scarlett Johansson sounding AI voice
In an interview post GPT-4o, Sam insisted that 4o’s ~320 miliseconds latency is human level, and cloud based AI is sufficient. He punted the question if on-device 4o level intelligence is needed. (Siri rumors)
MIT Tech reported that GPT-4o chinese tokens may have been polluted with spam and p-rn.
I still haven’t found technical details on how enterprises will be able to securely ground 4o’s vision/voice/reasoning to corporate data.

Google

The Aha!

Google is fully embracing their Gemini Era, infusing it consistently across all its products, and enabling developers to build amazing experiences. Here is their summary of 100 things they announced.
Here is my take on I/O. Getting relevant insights from massive amount of text/ docs/ images/ videos has always been their secret sauce. They are brilliantly positioned in the AI race, offering both cutting edge and open models. Hanging out with product leads gives you a great appreciation of just how complex these usecases are at scale.
10 min recap of Google I/O that you shouldn’t miss

Google made impressive improvements to its Gemini 1.5 Pro model from Feb to May. Also introduced the faster cheaper Flash model.
They are doing a great job with LLM Agents, executing multi-step complex workflows seamlessly within their apps. Early previews with clients have been promising.
Internet was split on AI Overviews, which presents a summarized contextual interactive answer up top, and pushed the links way down. I think it’s brilliant as long as it’s learning from my interactions to make it hyper personalized. Moving from link retrieval to answer retrieval is the right direction. More TikTok style vs. endless Netflix researching.
Overall very good progress with projects like Astra (for multi-modal reasoning), Veo (for text to 1080p videos), AlphaFold 3. Products updates like Ask Photos (AI insights from your photos) are super slick.

Uh-oh!

Another big flashy demo on stage with an inaccurate answer
Google is further alienating content creators with more backlash
Gemini has run into a lot of backlash with incorrect image creation
Very unclear on which feature is Generally Available to clients by when. How enterprises will ground it to their content etc.
Gemini is not optimal in the price/quality battle yet

Misc important AI news

Falcon2 - UAE’s Tech Innovation Institute released Falcon 2 including a vision model, and is competing well with Llama3, Gemini in it’s class
Microsoft is building it’s GPT4 competitor MAI-1 led by Mustafa Suleyman (co-founder Google Deepmind, ex-ceo InflectionAI that MS paid $650M to absorb)
Anthropic hired Instagram co-founder Mike Krieger to lead product. Anthropic also released their prompt enhancer that creates a very robust production ready prompt with the right techniques.
Apple released its thinnest ~5mm device ever - iPad Pro with a super powerful M4 chip. It was the most technically advanced but Meh! product yet. iPadOS is frustratingly holding back an insane product. Lots of rumors about WWDC, potentially OpenAI enhancing Siri.

--------
🔔 If you like such content, I encourage you to connect on my LinkedIn ♻️ Recommend your friends to subscribe to this free ‘AI with Whiskey’ newsletter
--------