Why I’ve spent my career trying to make machines more human

By Alan Bekker

When I say that I’ve spent much of my career trying to make machines more human, I don’t mean that I’m interested in turning machines into people, or confusing the boundary between human and artificial intelligence. What I mean is something more fundamental: I’ve been trying to understand what makes human interaction work in the first place, and how technology can participate in that interaction without flattening it. For me, this has never been only a technical problem. It has always been a philosophical one.

Long before I was working on AI products or founding companies, I was preoccupied with questions about what defines us as humans. Before university, I spent many years studying in a Jewish seminary, immersed in religious texts and modes of interpretation. That early education drew me to questions of meaning, purpose, and understanding. Religious study, at its core, is not about belief systems alone; it is about interpretation, dialogue, and the relationship between knowledge and experience. It asks how humans construct meaning, how they communicate values, and how understanding is transmitted from one mind to another. Those questions stayed with me long after I moved fully into technology.

When I later completed a PhD in machine learning and worked on speech technologies, computer vision, and natural language processing, I didn’t see these fields as ends in themselves. I saw them as tools; ways to explore how humans perceive, interpret, and interact. Technology became a mirror. If you can build a system that humans find intuitive to interact with, you learn something about human cognition. You learn what people expect without being told, what confuses them even when information is present, and where meaning emerges not from data, but from context.

From voice to visual understanding

My first company focused on voice. At the time, the idea that a machine could hold a natural phone conversation with a human was still surprising. We built voice agents that were deployed at scale in enterprise environments, including large call centers, and later the company was acquired by Snapchat. I led conversational AI systems serving hundreds of millions of daily users, which gave me a unique view into how people actually interact with machines when those machines are no longer experimental.

That experience made one thing very clear: voice alone is not enough.

Humans are not voice-first communicators. We are deeply visual. We rely on facial expressions, shared reference points, diagrams, screens, and gestures to understand each other. Even the best voice interaction strips away much of the context that makes communication efficient and trustworthy. A phone call can work, but it is never how humans communicate at their best.

Over time, I realized that the next step was not simply better voice models or smarter language generation. What was missing was the visual layer—the ability to interact face to face, to share and respond to visual information in real time. Not to make machines feel human, but to make interaction feel complete.

Why real time changes everything

This is where real-time avatars come in. At a basic level, a real-time avatar is a live, visual interface to AI. Much like a video call, you can speak, see a face, and share visual information. The difference is that the entity on the other side is not a prerecorded video or a scripted flow. It is a system that can reason, adapt, and respond in the moment.

The distinction between real-time interaction and prerecorded content is critical. When something is prerecorded, even if it is generated by AI, the structure of the interaction has already been decided. Someone assumed what the user would need, what they would ask, and how the explanation should unfold. Real-time interaction removes that assumption. The system can decide how to respond based on what is actually happening: what the user is looking at, what they are struggling with, and how the conversation is evolving.

This applies not only to language, but to visuals as well. A real-time avatar can choose which visual asset to show, generate new visuals on the fly, shift the explanation entirely, or switch languages instantly without interrupting the interaction. That combination of language, vision, and reasoning, happening together, in context, is what makes the experience feel intuitive rather than transactional.

The moment perception shifts

When people first encounter this kind of technology, reactions vary widely. Some are immediately excited, others are skeptical, and some are uneasy, often because their reference points come from science fiction rather than real products. But there is a very consistent moment when that uncertainty changes.

It happens when people realize that the avatar can see what they see.

The moment users understand that the system can observe their screen, understand their environment, and respond to their specific context, the conversation shifts. The avatar stops feeling like a novelty or a gimmick and starts to feel like a tool. It is no longer just answering questions—it is guiding, explaining, and adapting. That contextual awareness is what turns curiosity into conviction.

Where this creates real value

The impact of this approach is most visible in areas where explanation, guidance, and understanding are central. In education, students engage more deeply when learning is visual, adaptive, and responsive to them as individuals. We have seen measurable improvements when avatars are used as tutors, not because they replace teachers, but because they provide consistent, personalized support that scales.

In customer support and IT, the value is even more immediate. Instead of describing a problem verbally or following generic instructions, users are guided step by step by a system that sees exactly what they see. This eliminates friction, shortens resolution times, and reduces frustration. The same logic applies to onboarding, product training, and sales – any situation where understanding depends on shared context.

These benefits are not theoretical. They emerge from how human cognition actually works. We understand best when language, visuals, and interaction are integrated, not separated. Real-time avatars align with that reality.

What comes next

As I join Kaltura, the focus is on making this capability practical at enterprise scale. That means deeply integrating real-time avatars into organizational knowledge bases and data sources, so they operate with real context and clear guardrails. The goal is not to replace human interaction, but to extend it—to make expertise available when people are not, and to support employees and customers at moments that matter.

Looking ahead, I believe large organizations will deploy multiple avatars for different roles, customer service, onboarding, education, marketing, IT, each grounded in the organization’s real data and real workflows. At the same time, as AI becomes increasingly capable of storing and retrieving information, information itself will stop being the primary advantage. What will matter more is how people think: how they interpret, connect ideas, and exercise judgment.

Technology can support that shift, but it cannot replace it. The responsibility, then, is to design systems that respect human cognition, reduce friction, and allow people to focus on the kinds of thinking that only humans can do.

That is what I mean when I talk about making machines more human – and why I’ve spent my career working toward it.

Was this post useful?

Thank you for your feedback!

Alan Bekker

Chief Technology Officer, Kaltura