• Dev Notes
  • Posts
  • Meta's Massive AI Clusters for GenAI

Meta's Massive AI Clusters for GenAI

Good Morning! Meta unveils massive AI clusters containing 24,576 NVIDIA H100 GPUs each for developing artificial general intelligence (AGI) responsibly. Spotify accuses Apple of deliberately delaying approval for an updated Spotify iOS app that removes Apple's in-app payment system, claiming it violates the EU's new Digital Markets Act antitrust rules. NVIDIA's GTC 2024 conference from March 18-21 promises an immersive experience with over 900 sessions and workshops.

Meta's Massive AI Clusters for GenAI

Meta is working towards building artificial general intelligence (AGI) that is open and responsibly developed. As part of this ambition, Meta has unveiled details on two massive AI computing clusters.

Each cluster contains 24,576 powerful NVIDIA H100 GPUs. The network fabrics used are:

  • One cluster uses RDMA over Converged Ethernet (RoCE)

  • The other uses NVIDIA InfiniBand  

Both networking solutions can interconnect at blistering 400Gbps speeds. The clusters use Meta's open source Grand Teton hardware platform.

Storage Solutions: Meta has implemented custom storage solutions for these clusters. A Linux FUSE solution backed by Meta's Tectonic distributed flash storage system. Combined with the Hammerspace parallel network file system. This allows interactive debugging across thousands of GPUs simultaneously.

Initially, the huge clusters didn't perform as well as optimized smaller systems. However, after a series of optimizations, the expected performance was achieved:

  • Tweaks to job scheduler, network routing, NCCL libraries

  • Improvements to parallelization techniques

  • Enhancements to the PyTorch AI framework

These clusters are part of a larger AI infrastructure plan. Meta aims to grow to 350,000 H100 GPUs by end of 2024, equivalent compute power of nearly 600,000 H100 GPUs.

Read More Here

Spotify Says Apple Is Delaying App Update On Purpose

Spotify is accusing Apple of deliberately delaying approval for an updated version of Spotify's iOS app. Spotify claims this violates the European Union's new Digital Markets Act (DMA) rules.

What Spotify Did

  • On March 5th, Spotify submitted an app update to Apple that removes Apple's in-app payment system for subscriptions.

  • Instead, the new version prompts users to visit Spotify's website to subscribe - a change permitted under the DMA which took effect on March 7th.

However, nine days after submitting the update, Apple still had not approved the new app version. A Spotify spokesperson stated the delay "goes against [Apple's] claim that they approve app updates within 24 hours" and "ignores the timeline the EU Commission set out."

To Put In Context: This renews a long-running antitrust battle between the companies. The EU recently fined Apple €1.8 billion for disadvantaging music streaming rivals like Spotify on Apple's tightly controlled App Store.

Read More Here

GTC 2024: The Biggest AI Event You Can't Miss

NVIDIA's GTC 2024 conference happening from March 18-21 in San Jose, California and virtually online.

This year's huge event promises an amazing experience, loaded with over 900 sessions, workshops, and panel discussions exploring the cutting edge of accelerated computing and generative AI. Key highlights include:

  • The hugely anticipated keynote by NVIDIA CEO Jensen Huang, unveiling breakthrough AI advances set to transform industries

Immerse yourself in a tech wonderland with 300+ exhibitors showing off mind-blowing demos. Chat with famous experts and fellow attendees at special networking events. You can also check out dedicated areas showcasing auto, healthcare, robotics, and more exciting AI applications.

Read More Here

🔥 More Notes

  • Google announced that Google I/O 2024, its annual developer conference, will take place online on May 14th. The event will cover topics like AI application development, mobile and web development, cloud scaling, and more through keynotes, sessions, workshops, and demos.

  • Google has updated its Safe Browsing feature to perform real-time checks for potentially unsafe websites in a privacy-preserving manner, addressing the issue of short-lived unsafe sites evading detection. The new approach converts URLs into hashed prefixes, checks them against a database of unsafe site hashes while keeping the user's identity private, and displays warnings immediately upon detecting an unsafe site.

  • AWS has optimized the CloudFormation stack creation process, splitting resource creation and stabilization into two phases, which allows creating other resources earlier and speeds up the overall stack provisioning process. The new approach also introduces a retry capability and a new CONFIGURATION_COMPLETE event to better orchestrate resource provisioning.

Youtube Spotlight

How Data Structures & Algorithms are Actually Used

Click to Watch

In this video, Forrest discusses how data structures and algorithms are used in real-world applications, using examples from social media apps and an AI puzzle game. He also explains the use of data structures such as arrays, hashmaps, and sets, and demonstrate how sorting algorithms are implemented to organize content on a social media feed. Forrest also provides insights into using data structures and algorithms in an AI puzzle game, showcasing the use of hashmaps, arrays, and BFS search algorithm.

Was this forwarded to you? Sign Up Here

Reply

or to participate.