Newwz.Space

check out different kinds of news here.

Attention in SRAM on Tenstorrent Grayskull

https://arxiv.org/abs/2407.13885...

Attention in SRAM on Tenstorrent Grayskull

When implementations of the Transformer's self-attention layer utilize SRAM instead of DRAM, they can achieve significant speedups. The Tenstorrent Grayskull architecture provides a large SRAM, distributed across a grid of cores. This work presents a fused kernel for Grayskull, that exclusively utilizes its large SRAM by combining matrix multiplication, attention score scaling and Softmax operations. Additionally, a dedicated Softmax kernel utilizing the SRAM and a CPU implementation serving as a baseline are presented. The Softmax operation consumes most of the runtime in the computation of attention weights from queries and keys on Grayskull. The speedup of the dedicated Softmax kernel compared to the CPU implementation is up to $10 \times$, and the Softmax implementation inside the fused kernel is approximately $1.8 \times$ faster than the dedicated Softmax kernel. The time and memory complexity of all implementations is quadratic in sequence length. Currently, the Grayskull e150 is approximately $30 \times$ cheaper for the general public than an Nvidia H100 PCIe (a state-of-the-art GPU) and offers approximately $1.5 \times$ more SRAM.

AI-Powered Bug Hunting – Evolution and Benchmarking [pdf]

Wednesday 31 Jul 2024

resource

Robin Warren Has Died

Wednesday 31 Jul 2024

resource

Jamie Dimon's Advice on Business Travel Is a Wake-Up Call to CEOs

Wednesday 31 Jul 2024

resource

GeoCities Dad Hat

Wednesday 31 Jul 2024

resource

The Sound of Apple

Wednesday 31 Jul 2024

resource

Telegram's Founder Plans to 'Open Source His DNA'

Wednesday 31 Jul 2024

resource

Character Spacing Bypass in Prompt-Guard-86M Classifier

Wednesday 31 Jul 2024

resource

AI Powered Home School

Wednesday 31 Jul 2024

resource

Uber deal could add 100k ridehailing EVs to roads

Wednesday 31 Jul 2024

resource

NIFC: When all the West is on fire at once, this is who deals with it

Wednesday 31 Jul 2024

resource

Thermometer: Towards Universal Calibration for Large Language Models

Wednesday 31 Jul 2024

resource

Regular Glucosamine Use and Mortality

Wednesday 31 Jul 2024

resource

Why I Finally Quit Spotify

Wednesday 31 Jul 2024

resource

Account Abstraction is better on Starknet?

Wednesday 31 Jul 2024

resource

Unbound – Validating, Recursive, Caching DNS Resolver

Wednesday 31 Jul 2024

resource

Git Cheatsheet

Wednesday 31 Jul 2024

resource

Lessons Learned from Building a Serverless Node.js API with Vercel/Neon/Prisma

Wednesday 31 Jul 2024

resource

Architecting the Data Architect:Generative AI for Enterprise Data Modeling [video]

Wednesday 31 Jul 2024

resource

Engineering team uses salt for thermal energy storage

Wednesday 31 Jul 2024

resource

Ampere AmpereOne Aurora 512 Core AI CPU Announced

Wednesday 31 Jul 2024

resource

Open ROMs

Wednesday 31 Jul 2024

resource

AI for sales reps raises $11M Series A

Wednesday 31 Jul 2024

resource

Deploy Production-Ready AIGC Apps on Kubernetes Using KubeBlocks and Dify

Wednesday 31 Jul 2024

resource

JSONPath: Query Expressions for JSON [RFC9535]

Wednesday 31 Jul 2024

resource

Maddy: Composable all-in-one mail server

Wednesday 31 Jul 2024

resource

Tiny-TPU: A Minimal Tensor Processing Unit (TPU) Inspired by Google's TPUv1

Wednesday 31 Jul 2024

resource

Squint's New Logo

Wednesday 31 Jul 2024

resource

Is Google driving us dumb? The introspective looping

Wednesday 31 Jul 2024

resource

Things I always do after installing Linux – and why

Wednesday 31 Jul 2024

resource

One year on 'Mars': Inside NASA's ultra-realistic isolation study

Wednesday 31 Jul 2024

resource

Flights cancelled due to heat wave; experts explain why flights couldnt take off

Wednesday 31 Jul 2024

resource

Government Design Principles

Wednesday 31 Jul 2024

resource

QSourcer – Find talent with AI and Boolean queries

Wednesday 31 Jul 2024

resource

Make your own luck: do this to stand out in a crowded industry [video]

Wednesday 31 Jul 2024

resource

Publish or Perish

Wednesday 31 Jul 2024

resource

Flow is the Opiate of the Mediocre (2011)

Wednesday 31 Jul 2024

resource

AMD 2024 Q2 Financials: It's All AI++ – By Dr. Ian Cutress

Wednesday 31 Jul 2024

resource

An agent to navigate previously unseen code repositories to solve queries

Wednesday 31 Jul 2024

resource

A Tour of Program Optimization [video]

Wednesday 31 Jul 2024

resource

China stops worrying about lack of GPUs and learns to love the supercomputer

Wednesday 31 Jul 2024

resource