3D Avatar

Live VRM companion with emotion and gestures

Sidekick AI lives beside your game as a real 3D avatar — animated, expressive, present. The difference between consulting a tool and playing with a teammate.

Add to Steam Wishlist

How It Works

Full 3D VRM avatar

Real 3D model rendered live in a floating window beside your game. Not a portrait, not a sprite, not a static image — a body with bones, expressions, and presence.

Emotion and gestures

The avatar reacts: smiles at clutch plays, winces at deaths, points at things on your screen, leans in during boss fights. Reactions you actually feel because you can see them.

Lip sync to the voice layer

Mouth movement matches the audio. The voice tips you hear come from a face that's actually saying them, not a disembodied speaker.

Companion customization

Pick or tweak the avatar that fits the vibe you want. The companion identity is consistent across sessions instead of resetting every time.

Why presence is a feature, not decoration

The 3D avatar is the most quietly opinionated choice in Sidekick AI. Most AI products treat the visual layer as a logo — a portrait, a chat bubble, maybe a 2D character. Sidekick treats it as a teammate with a body. The difference is presence: the sense that someone is playing with you, not just speaking at you.

Presence is the thing voice alone can't deliver. Voice gives you information. The avatar gives you company. For the moments you fire up Sidekick at 2am to push through a hard boss, company turns out to matter more than information.

How reactions work

Screen vision flags notable moments — a boss kill, a clutch dodge, a death, a rare drop, a critical strike, the start of a new encounter. Those signals travel to the avatar layer alongside the voice line. The avatar's expression, posture, and gesture pick up the moment. You die to Malenia again and the companion sighs. You finally beat her and the avatar throws hands up. None of this is narrated — you just see it.

Reactions are timed to feel like a teammate noticing things, not a cartoon. The avatar doesn't celebrate every kill or wince at every hit. It reacts to moments that genuinely matter, which is usually fewer per session than you'd expect.

Avatar versus portrait — the category difference

Other AI companions in the gaming space lean on 2D portraits, static character art, or stylized chat windows. Those work for chat. They don't deliver the felt experience of having a teammate present in your session. A 3D VRM avatar that animates, expresses, and reacts is a meaningfully different product shape — closer to VTubing than chatbot, closer to a co-op partner than a tool. The longer comparisons live at Sidekick AI vs Character.AI and Sidekick AI vs Questie.ai.

That said, if the visual layer isn't what you want, the product is fully usable as voice-only. Hide the avatar window and Sidekick is just a voice in your headset. The 3D layer is opt-in for the players who want it, not a tax for the players who don't.

How the 3D rendering actually works

The avatar is a full VRM model — bones, blend shapes, expression rigs, the same format VTubers use to push real-time animation. It isn't prerendered video, isn't a sequence of 2D images, isn't a stylized chat bubble. The companion you see in the floating window is a live 3D character with a skeleton, animated in real time as Sidekick decides what to do.

Rendering happens in a separate window from your game. That gives the avatar a few useful properties. The avatar window runs at its own frame rate, so it stays smooth even if your game stutters. The rendering pipeline doesn't touch your game's memory or hook into its render loop — anti-cheat systems see Sidekick as a regular desktop app, because that's what it is. GPU footprint is small relative to a modern AAA title — negligible on a rig capable of running Elden Ring at 60 FPS.

Animation comes from two sources. The avatar reacts to screen-vision signals (boss telegraphs, deaths, big crits, encounter starts) and to voice events (mouth shapes match the audio being spoken). Both are timed to feel like a teammate noticing things, not a constant cheerleader. The result is presence at frame rate, not a static portrait pretending to be alive.

Personalities — the companion that fits your play style

The 3D avatar isn't a generic mannequin with a voice slapped on. Each available companion ships with a personality that determines how it speaks, what it reacts to, how visibly it reacts, and what kind of teammate it is during a session. Nova(calm & strategic) delivers a clean “dodge right” with a small head-tilt. Luna(hype & energetic) yells “DODGE RIGHT” and throws hands up. Mika(gentle & supportive) delivers the same callout calmly and reassuringly. Same coaching information, three different teammates.

The other three personas each handle a different play style. Kaze(cool & mysterious) holds the same callout for one extra second before speaking — when Kaze tells you something, you listen. Aura(sharp & precise) calls timing windows by the frame: “dodge now — three frames, you have a window.” Ren(bold & fearless) pushes you off the back foot: “heal-and-retreat is the trap here — punish the recovery, you have the opening.”

Six personas ship today — Nova, Luna, Kaze, Aura, Mika, and Ren — each with a distinct body language, voice, and reaction style. The full visual gallery lives at the Avatar Gallery, where you can scroll through the cast and pick the one that fits how you want to be coached.

Persona shapes the experience because the avatar's body language and tone do most of the emotional work, not just the words. The voice and the visuals are designed together so the companion reads as one character rather than a portrait speaking borrowed lines. The bet is that one consistent persona kept across a long playthrough builds the same teammate feeling Souls players get from a real co-op partner. Compare this to Questie.ai's marketplace of swappable characters: different bet, different product shape.

Every persona is stream-safe by default. There's no toxic mode, no off-tone mode, no “dark turn” persona. Roleplay-style relationship modes that ship in some other AI-companion products aren't part of Sidekick's persona set, by design. The product shape is teammate; the personas are different flavors of that single shape.

Voices — the audio half of the avatar

Sidekick is voice-first. The avatar's mouth moves, but the audio in your headset is the primary channel — that's what calls boss timing, points at exits in metroidvanias, and narrates highlight clips for HypeReel. The 3D avatar exists because pairing voice with a visible face is how presence works. Mouth shapes (visemes) match the audio in real time; the avatar that speaks the line is the avatar you see speaking it.

Voice character is tied to persona character. Nova sounds analytical and even-toned; Luna sounds energetic and louder; Mika sounds reassuring and patient; Kaze speaks in short precise bursts; Aura reads timing windows with frame-level clarity; Ren projects confidence and pushes you forward. The same persona keeps the same voice across sessions — not a different voice every time you launch. Voice tone, voice rhythm, voice volume, and the avatar's body language are tuned together so a session feels like playing with one teammate, not a portrait reading a script.

Language coverage is broad and native rather than translated. Sidekick's multilingual layer ships dedicated voices for Portuguese, Spanish, Russian, Mandarin, Arabic, Italian, Japanese, Korean, French, and German alongside English — not one base voice speaking each language with an accent. The avatar's lip sync works against whichever language voice is active. Game terms (boss names, item names, quest names) stay in the original game language so a wiki cross-reference still works; the explanations around them are localized.

Frequently Asked Questions

What is a VRM avatar?
VRM is an open standard for 3D humanoid avatars, widely used in VTubing, virtual worlds, and gaming. Sidekick uses VRM because it's the right format for real-time rendered companions — fully rigged, expression-capable, and well-supported. Your avatar isn't a video; it's a live 3D model that animates in response to what's happening.
Where does the avatar live during gameplay?
In a floating, resizable window beside your game window. You can dock it to a corner, drop it on a second monitor, or shrink it to a thumbnail when you want minimum visual footprint. The avatar never overlays gameplay unless you explicitly want it to.
Does the 3D avatar slow down my game?
On any modern gaming PC, no. VRM rendering is light compared to what your game is doing. Sidekick is designed so the avatar window stays smooth even when your game is pushing your GPU.
Why does a 3D avatar matter? Couldn't I just use voice?
You could. Voice alone works fine functionally. The 3D avatar adds presence — the difference between hearing a tip and playing alongside someone. Looking over and seeing a teammate reacting to your plays is a different kind of company than hearing a voice from nowhere. Whether that matters to you is a personal call, and you can shrink the avatar window if it doesn't.
Can I bring my own VRM model?
VRM is an open format, so this is a natural roadmap direction. Initial release ships a set of curated avatars that look right and behave consistently. Bring-your-own-VRM is the kind of customization that's easier to add once the baseline product is dialed in.
What if I stream? Does the avatar need to be off-camera?
Most streamers will want the avatar on camera. Sidekick's 3D companion is built stream-safe by design — appropriate poses, no inappropriate content, gaming-focused personality. Drop the avatar window into OBS as a windowed source and you have an on-screen companion for your viewers.
How does the avatar know when to react?
Same pipeline that drives the coaching voice. Screen vision recognizes notable moments — boss kills, deaths, big crits, rare drops — and the avatar gets reaction signals along with whatever the voice says. Reactions are tuned to be timely, not constant.
Can I turn off the avatar entirely?
Yes. Voice-only mode hides the avatar window completely. You still get the full real-time coaching experience, just without the visual companion. The 3D layer is a feature, not a requirement.
How does Sidekick's 3D avatar compare to Questie.ai or Character.AI on the visual layer?
Questie.ai and Character.AI both lean on 2D portraits or stylized chat-window characters. Those work for chat. They don't deliver the same felt experience as a full 3D body that animates beside live gameplay. Sidekick's VRM avatar reacts to screen-vision signals from the game, has lip sync that matches the voice, and lives in a separate window from your game. The visual layer is a product-shape difference, not a polish difference.
Can I choose a different voice for the avatar?
Yes. Each available persona ships with a matching voice — a deadpan analyst sounds analytical, a hype caster sounds energetic, a chill chaperone sounds reassuring. The persona and voice are designed as a bundle, so picking a persona picks the voice that fits its body language. Bring-your-own-voice cloning isn't on the initial ship.
Does the avatar speak languages other than English?
Yes. Sidekick ships dedicated native voices for Portuguese, Spanish, Russian, Mandarin, Arabic, Italian, Japanese, Korean, French, German, plus English — not one English voice with accents. The 3D avatar's lip sync works against whichever language voice is active. Game-specific terms (boss names, item names) stay in the original game language so you can cross-reference a wiki; the explanations around them are localized.

Ready to play smarter?

Sidekick AI uses vision AI to watch your screen and coach you in real-time. Try the free demo on Steam.

Add to Steam Wishlist