3rd May'24 - AI insights.. Neatly distilled

Past 2 weeks in the #GenAI world were action packed as always..
Quick recap as of 3rd May 2024.

Whiskey pairing suggestion.. Noah’s Mill Bourbon by Willett. I am a big supporter of small batch bourbons from KY. If you can find a bottle of Noah’s Mill batched pre-2016, jump on it. Distinctly nutty aroma, notes of seasoned charred oak, and earthy palate.

Snowflake Arctic LLM

  • Released a truly open model optimized for enterprise relevant benchmarks

  • 480B parameters (128x3.3B Mixture of Experts + 10B dense model), 17B parameters active at a time

  • Leverages a hybrid architecture “Dense - MoE hybrid transformer’ vs. Dense (Llama2/3) or MoE (Mixtral, Grok, DBRX)

  • Apache 2.0 license provides ungated access to weights, code, data recipes, and research insights

  • Snowflake is not trying to build a universally awesome LLM. They are focusing on Coding (HumanEval+ and MBPP+), SQL Generation (Spider) and Instruction following (IFEval) which are better aligned with enterprise AI usecases. So it’s fair that Arctic trails Llama 3 70B in other areas eg. world knowledge (MMLU) and math (GSM8K)

  • Choses 2 active of 128 experts vs. DBRX choses 4 of 16 experts, Mixtral and Grok chose 2 of 8 experts. So despite being 480B parameters overall, it’s very quick with inference

  • Announcement here, try it free here

Apple

  • Apple broke it’s walled garden reputation and unveiled an Open-source Efficient Language Model (OpenELM) for on-device AI.

  • 270M, 450M, 1.1B and 3B parameter models trained on 1.8T tokens. Big innovation is OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy.

  • Model here and technical paper here

  • Acquired Datakalab, that specializes in data compression and computer vision for on-device AI. Apple continues it’s acquisition spree with DarwinAI and 23 startups in 2023

  • Rumored to be evaluating OpenAI & Google for GenAI collaboration

  • Rumored to be working on it’s own AI server processor using TSMC’s 3nm process, targeting 2H25

IBM

  • Acquired HashiCorp for $6.4B to further enhance enterprise AI workloads across hybrid cloud. Great addition to help clients optimize data & compute costs. Details here

mysterious gpt2 drama

  • A mysterious “gpt2-chatbot” model showed up on LMSYS chatbot arena with very impressive GPT-4 level results. It solved a tough math olympiad problem zero-shot! It was rumored that it could be a GPT 4.5 or 5 prototype for a masked user test. Quickly taken down.

OpenAI

  • Jensen Huang hand delivers Nvidia’s first Nvidia H200 to OpenAI, the world’s most powerful GPU. It’s a beast.

  • OpenAI released their new ‘memory’ capabilities to all users, to get them closer to a personalized assistant.

  • OpenAI makes a content deal with Financial Times to train AI

  • Mentions of search.openai.com started showing up in logs hinting on inevitable jump of OpenAI into search leveraging Bing

Google

  • Google makes a $6M/yr content deal with News Corp to train AI

  • Google launched Med-Gemini, that significantly outperforms GPT4 and crushes MedQA-USMLE benchmark with 91.1% accuracy. Paper here

Opensource Medical LLMs

  • OpenBioLLM, Llama-3 based open-source 70B & 8B models outperform GPT-4, Gemini, Meditron-70B, Med-PaLM-1 & Med-PaLM-2 in Medical-domain leaderboard here

  • Profluent launched OpenCRISPR-1, the world's first open-source AI gene editor capable of editing the human genome. Trained LLMs on a vast dataset of diverse CRISPR systems to generate millions of new CRISPR-like proteins not found in nature. Details here

Meta

  • Meta’s Llama3 has already exceeded 2,750 variations, 2M+ downloads in under 2 weeks ! Llama3 on Groq delivers blistering fast 800 tokens/sec.

  • Meta RayBan glasses [here] got a GenAI multi-modal upgrade, so now an LLM can process visual data from the glasses’ built in camera and offer relevant insights, while chilling on the beach. Despite my disappointment with Rabbit R1, I am still bullish on Intelligent edge devices.

Misc

  • AWS GAed its Amazon Q to take on co-pilot workflows. In certain usecases, specially baked into AWS services and IBM Consulting assets, it’s been working very well for our preview clients. [link]

  • Adobe launched Firefly 3 and integrated it right into Photoshop workflows. Also introduced project Blurry HD to upscale low res footage.

  • Devin, first AI software developer’s parent company Cognition Labs is seeking a whopping $2B Valuation after being founded just 6 months back. Although there was some noise last week around Devin demo’s accuracy/ effectiveness.

  • Synthesia upgraded their digital avatars to have facial expressions [link]. HeyGen avatars have been great, and in my side-by-side, Synthesia has really caught up.

Robotics

  • Astribot S1, a humanoid robot developed by a Chinese firm, showed off some insane agility, dexterity and accuracy in performing repetitive day to day tasks eg. pouring wine : ) If the video is legit, this promises to be a huge leap forward in robot capabilities.

Good Tech

  • NIST launched a GenAI program to benchmark GenAI tech and identify AI-generated content [link]

  • World Economic Forum published great report on role of AI in education [link]

  • Thorn, All Tech is Human joined forces with all the tech giants to address Child Safety in the era of GenAI [link]

Notable recent papers/reports:

  • Crunchbase released their Tech Trends report here with some great industry insights

  • Instead of using one LLM-as-a-judge, Cohere proposes benefits of a panel of diverse models. [link]

  • OS World released benchmarking framework to evaluate LLM Autonomous Agents [link]