Question 1

How does Sidekick AI's screen vision actually work?

Accepted Answer

Sidekick captures frames from your gameplay window on a regular cadence and runs them through a vision-language model with a game-aware coaching prompt. The model identifies what's on screen — the game, the scene, the active mechanic, the player state — and passes that structured understanding to the coaching layer. The coaching layer decides what (if anything) to say. The result is voice tips that match what's actually happening, not generic advice.

Question 2

Which games does screen vision work with?

Accepted Answer

Any PC game that runs in a standard window. There's no per-game integration required because the vision layer reads the rendered screen rather than the game's internal state. Single-player and co-op titles are where the experience is sharpest because the coaching content is tuned for them — Elden Ring, Baldur's Gate 3, Hollow Knight, Dark Souls 3, Resident Evil 4, Silent Hill 2, Lethal Company, Phasmophobia, Minecraft, and more.

Question 3

Does screen vision slow down my game?

Accepted Answer

No measurable impact on most setups. The frame capture is lightweight and happens outside the game's render loop. The vision analysis runs on a separate thread or device. Sidekick is designed so your frame rate and input latency stay where they are — coaching is the value, not the bottleneck.

Question 4

Can Sidekick see HUD elements, menus, and inventory screens?

Accepted Answer

Yes. The vision layer reads the whole rendered frame, including UI elements like health bars, mini-maps, inventory grids, and dialog boxes. This is how Sidekick can say things like "you're at 30% health, back off" or "that spell scroll in the loot drop is worth picking up."

Question 5

What about spoilers? Will Sidekick reveal late-game content?

Accepted Answer

The coaching layer is tuned to talk about what's on screen right now, not to volunteer information about content you haven't reached. If a story beat is about to trigger, Sidekick won't pre-empt it. If you actively ask for help on a puzzle whose solution involves later content, the companion can warn you and let you decide.

Question 6

Is screen vision different from streaming or screen recording?

Accepted Answer

Yes. Streaming tools capture and broadcast your screen to a public audience. Screen recording saves your screen to a local file. Sidekick's screen vision captures frames in real time so the coaching layer can act on the moment — the goal is voice tips in your headset, not a broadcast or a saved video. Sidekick is built to coexist with your existing streaming setup rather than replace it.

Question 7

Does screen vision work on multi-monitor setups?

Accepted Answer

Yes. You select which window or display Sidekick reads. The companion only sees the surface you point it at, so a second monitor with Discord, OBS, or a wiki tab stays private.

Question 8

Can I turn screen vision off temporarily?

Accepted Answer

Yes. There's a clear toggle to pause vision capture. When vision is paused, the companion still talks but stops referencing the screen — useful for cutscenes, story moments, or when you just want company without coaching.

Question 9

Does Sidekick capture frames via DLL injection or anything anti-cheat would flag?

Accepted Answer

No. Sidekick reads the game window the same way OBS or any screen-capture tool does — using the operating system's standard window-capture APIs. There's no DLL injection, no game-memory hook, no driver-level instrumentation. Anti-cheat systems see Sidekick as a regular desktop application, because that's what it is. The companion never touches the game's process.

Question 10

Can I run Sidekick alongside a streaming setup like OBS?

Accepted Answer

Yes. Sidekick captures from the game window; OBS captures from whatever scene you've set up. They don't compete for the same surface, and both can run simultaneously without one breaking the other. The avatar window and Sidekick's voice output are both routable into OBS as a window source and an audio source if you want the companion on stream.

Screen Vision

How It Works

Reads your screen frame by frame

Identifies game state, not just pixels

Acts on the moment, not the description

Lightweight on-device gate, server-side analysis

Why screen vision is the headline feature

What “reads your screen” means in practice

How Sidekick differs from Character.AI, ChatGPT, and Replika

How the vision pipeline actually works

What it sees during play

Privacy and control — what the companion does and doesn't see

Frequently Asked Questions

Related Resources

Voice Chat

3D Avatar

Sidekick AI vs. Character.AI

Ready to play smarter?