Newwz.Space

check out different kinds of news here.

OmniParser for Pure Vision Based GUI Agent

https://arxiv.org/abs/2408.00203...

OmniParser for Pure Vision Based GUI Agent

The recent success of large vision language models shows great potential in driving the agent system operating on user interfaces. However, we argue that the power multimodal models like GPT-4V as a general agent on multiple operating systems across different applications is largely underestimated due to the lack of a robust screen parsing technique capable of: 1) reliably identifying interactable icons within the user interface, and 2) understanding the semantics of various elements in a screenshot and accurately associate the intended action with the corresponding region on the screen. To fill these gaps, we introduce \textsc{OmniParser}, a comprehensive method for parsing user interface screenshots into structured elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface. We first curated an interactable icon detection dataset using popular webpages and an icon description dataset. These datasets were utilized to fine-tune specialized models: a detection model to parse interactable regions on the screen and a caption model to extract the functional semantics of the detected elements. \textsc{OmniParser} significantly improves GPT-4V's performance on ScreenSpot benchmark. And on Mind2Web and AITW benchmark, \textsc{OmniParser} with screenshot only input outperforms the GPT-4V baselines requiring additional information outside of screenshot.

Favicon

Google listed my restaurant's number as its British HQ

Thursday 01 Aug 2024

Favicon

This Month in Ladybird July 2024

Thursday 01 Aug 2024

Favicon

Creating Custom FastHTML Tags for Markdown Rendering

Thursday 01 Aug 2024

Favicon

I made a realistic fake tweet generator that's easy to use

Thursday 01 Aug 2024

Favicon

Should electric mopeds be regulated like bikes or motorcycles?

Wednesday 31 Jul 2024

Favicon

Using computer vision and LLMs to build a real-time personal trainer

Wednesday 31 Jul 2024

Favicon

First Python Package Release. A Pydantic-Backed Solana RPC Client

Wednesday 31 Jul 2024

Favicon

Build a FAST key-value store with Rust

Wednesday 31 Jul 2024

Favicon

California's Highway 1 is showing the limits of man's ingenuity

Wednesday 31 Jul 2024

Favicon

The Wisdom of Fish Schools

Wednesday 31 Jul 2024

Favicon

Cerbot Is Now on 4M Servers, Maintaining over 31M Websites

Wednesday 31 Jul 2024

Favicon

Boeing Names Kelly Ortberg as Its Chief Executive

Wednesday 31 Jul 2024

Favicon

Complete WordStar 7.0 Archive

Wednesday 31 Jul 2024

Favicon

Brain images produced by AI are realistic, accurate to use in medical research

Wednesday 31 Jul 2024

Favicon

Interactive map of U.S. road fatalities in the 21st century

Wednesday 31 Jul 2024

Favicon

The Feminist Botanist

Wednesday 31 Jul 2024

Favicon

Write on Your Phone

Wednesday 31 Jul 2024

Favicon

Pi Stack: FOSS Stackable Raspberry Pi Housing

Wednesday 31 Jul 2024

Favicon

Speed Comparison of the Most Popular Retrieval Systems for RAG

Wednesday 31 Jul 2024

Favicon

What Happened at Baiae, Stayed at Baiae

Wednesday 31 Jul 2024

Favicon

Faster Backups with Sharding

Wednesday 31 Jul 2024

Favicon

NASA's First-Ever Quantum Memory

Wednesday 31 Jul 2024

Favicon

Jamie Dimon's Advice on Business Travel Is a Wake-Up Call to CEOs

Wednesday 31 Jul 2024

Favicon

Telegram's Founder Plans to 'Open Source His DNA'

Wednesday 31 Jul 2024

Favicon

Character Spacing Bypass in Prompt-Guard-86M Classifier

Wednesday 31 Jul 2024

Favicon

AI Powered Home School

Wednesday 31 Jul 2024

Favicon

Uber deal could add 100k ridehailing EVs to roads

Wednesday 31 Jul 2024

Favicon

Why I Finally Quit Spotify

Wednesday 31 Jul 2024

Favicon

Account Abstraction is better on Starknet?

Wednesday 31 Jul 2024

Favicon

Unbound – Validating, Recursive, Caching DNS Resolver

Wednesday 31 Jul 2024

Favicon

Wednesday 31 Jul 2024

Favicon

Lessons Learned from Building a Serverless Node.js API with Vercel/Neon/Prisma

Wednesday 31 Jul 2024

Favicon

Ampere AmpereOne Aurora 512 Core AI CPU Announced

Wednesday 31 Jul 2024

Favicon

Maddy: Composable all-in-one mail server

Wednesday 31 Jul 2024

Favicon

Squint's New Logo

Wednesday 31 Jul 2024

Favicon

Is Google driving us dumb? The introspective looping

Wednesday 31 Jul 2024

Favicon

Flights cancelled due to heat wave; experts explain why flights couldnt take off

Wednesday 31 Jul 2024

Favicon

QSourcer – Find talent with AI and Boolean queries

Wednesday 31 Jul 2024

Favicon

Publish or Perish

Wednesday 31 Jul 2024

Favicon

AMD 2024 Q2 Financials: It's All AI++ – By Dr. Ian Cutress

Wednesday 31 Jul 2024