3rd May'24 - AI insights.. Neatly distilled

Shobhit Varshney
May 03, 2024

Past 2 weeks in the #GenAI world were action packed as always..
Quick recap as of 3rd May 2024.

Whiskey pairing suggestion.. Noah’s Mill Bourbon by Willett. I am a big supporter of small batch bourbons from KY. If you can find a bottle of Noah’s Mill batched pre-2016, jump on it. Distinctly nutty aroma, notes of seasoned charred oak, and earthy palate.

Snowflake Arctic LLM

Released a truly open model optimized for enterprise relevant benchmarks
480B parameters (128x3.3B Mixture of Experts + 10B dense model), 17B parameters active at a time
Leverages a hybrid architecture “Dense - MoE hybrid transformer’ vs. Dense (Llama2/3) or MoE (Mixtral, Grok, DBRX)
Apache 2.0 license provides ungated access to weights, code, data recipes, and research insights
Snowflake is not trying to build a universally awesome LLM. They are focusing on Coding (HumanEval+ and MBPP+), SQL Generation (Spider) and Instruction following (IFEval) which are better aligned with enterprise AI usecases. So it’s fair that Arctic trails Llama 3 70B in other areas eg. world knowledge (MMLU) and math (GSM8K)
Choses 2 active of 128 experts vs. DBRX choses 4 of 16 experts, Mixtral and Grok chose 2 of 8 experts. So despite being 480B parameters overall, it’s very quick with inference
Announcement here, try it free here

Apple

Apple broke it’s walled garden reputation and unveiled an Open-source Efficient Language Model (OpenELM) for on-device AI.
270M, 450M, 1.1B and 3B parameter models trained on 1.8T tokens. Big innovation is OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy.
Model here and technical paper here
Acquired Datakalab, that specializes in data compression and computer vision for on-device AI. Apple continues it’s acquisition spree with DarwinAI and 23 startups in 2023
Rumored to be evaluating OpenAI & Google for GenAI collaboration
Rumored to be working on it’s own AI server processor using TSMC’s 3nm process, targeting 2H25

IBM

Acquired HashiCorp for $6.4B to further enhance enterprise AI workloads across hybrid cloud. Great addition to help clients optimize data & compute costs. Details here

mysterious gpt2 drama

A mysterious “gpt2-chatbot” model showed up on LMSYS chatbot arena with very impressive GPT-4 level results. It solved a tough math olympiad problem zero-shot! It was rumored that it could be a GPT 4.5 or 5 prototype for a masked user test. Quickly taken down.

OpenAI

Jensen Huang hand delivers Nvidia’s first Nvidia H200 to OpenAI, the world’s most powerful GPU. It’s a beast.
OpenAI released their new ‘memory’ capabilities to all users, to get them closer to a personalized assistant.
OpenAI makes a content deal with Financial Times to train AI
Mentions of search.openai.com started showing up in logs hinting on inevitable jump of OpenAI into search leveraging Bing

Google

Google makes a $6M/yr content deal with News Corp to train AI
Google launched Med-Gemini, that significantly outperforms GPT4 and crushes MedQA-USMLE benchmark with 91.1% accuracy. Paper here

Opensource Medical LLMs

OpenBioLLM, Llama-3 based open-source 70B & 8B models outperform GPT-4, Gemini, Meditron-70B, Med-PaLM-1 & Med-PaLM-2 in Medical-domain leaderboard here
Profluent launched OpenCRISPR-1, the world's first open-source AI gene editor capable of editing the human genome. Trained LLMs on a vast dataset of diverse CRISPR systems to generate millions of new CRISPR-like proteins not found in nature. Details here

Misc

AWS GAed its Amazon Q to take on co-pilot workflows. In certain usecases, specially baked into AWS services and IBM Consulting assets, it’s been working very well for our preview clients. [link]
Adobe launched Firefly 3 and integrated it right into Photoshop workflows. Also introduced project Blurry HD to upscale low res footage.
Devin, first AI software developer’s parent company Cognition Labs is seeking a whopping $2B Valuation after being founded just 6 months back. Although there was some noise last week around Devin demo’s accuracy/ effectiveness.
Synthesia upgraded their digital avatars to have facial expressions [link]. HeyGen avatars have been great, and in my side-by-side, Synthesia has really caught up.

Robotics

Astribot S1, a humanoid robot developed by a Chinese firm, showed off some insane agility, dexterity and accuracy in performing repetitive day to day tasks eg. pouring wine : ) If the video is legit, this promises to be a huge leap forward in robot capabilities.

Good Tech

NIST launched a GenAI program to benchmark GenAI tech and identify AI-generated content [link]
World Economic Forum published great report on role of AI in education [link]
Thorn, All Tech is Human joined forces with all the tech giants to address Child Safety in the era of GenAI [link]

Notable recent papers/reports:

Crunchbase released their Tech Trends report here with some great industry insights
Instead of using one LLM-as-a-judge, Cohere proposes benefits of a panel of diverse models. [link]
OS World released benchmarking framework to evaluate LLM Autonomous Agents [link]