The Evolution of HCI

It's not slowing down anytime soon

Whether you’re an engineer, a designer, a member of the support team, or a product manager, how customers interact with your product drastically impacts the knowledge and skills you need to succeed.

While core fundamentals of usability translate fairly well across a wide variety of products, understanding what makes them not just usable but delightful to use is something that every function needs to understand and value. That’s not me saying that product teams building connected devices should only hire people with experience in hardware or that designing a great voice-to-voice interaction requires in-depth experience in that space.

It’s more of a challenge to keep learning and to dive deep into what it means to create a great product experience in any medium or format. One of the best ways to learn about new technologies and opportunities is to understand where we’re at now and how we got here. As I write this, new products are launching that change the paradigms for modern usability, and I’ve found immense value in being able to understand the difference between true innovation and repackaged solutions in the context of products being interesting, marketable, and sellable.

A History of Human-Computer Interaction

Human-Computer Interaction (HCI) has profoundly shaped our interaction with technology. It's an interdisciplinary field that emerged alongside the rise of personal computing, blending elements from computer science, cognitive psychology, design, and ergonomics. Initially, HCI focused on making systems efficient and functional, but with time, it has pivoted towards enhancing user experience, emphasizing ease of use, understanding, and user satisfaction.

From the early command-line interfaces requiring specific commands and technical knowledge, HCI evolved into graphical user interfaces (GUIs). This shift marked a significant milestone, transforming computers from specialist tools to everyday essentials. GUIs introduced visual elements like icons and windows, making interactions more intuitive and accessible.

This evolution reflects a broader trend in HCI: a constant pursuit to make technology more user-friendly and aligned with human needs and behaviors. As we progress, HCI continues to integrate more sophisticated technologies, such as voice recognition, touch interfaces, and augmented reality, further blurring the lines between humans and computers. The journey of HCI is not just a tale of technological advancement but also a narrative of adapting technology to human life, making it an integral, seamless part of our daily existence.

Looking Back: A Brief History of The GUI

I learned a lot while researching this piece specially. Having grown up in the 90s, I learned how to use disk drives, CD roms, Windows, and DOS to access games like Doom or Lemmings and didn't spend a lot of time thinking about the magic of the tech behind it.

The foundation of Human-Computer Interaction (HCI) is deeply rooted in the development of the Graphical User Interface (GUI). The GUI's journey began with early dynamic information devices, like radar displays, which allowed the computer to display the data that was streaming into it.

The 1980s saw significant developments in GUIs. The Three Rivers PERQ was introduced as the first commercially produced personal workstation with a GUI. Apple's Lisa, released in 1983, became the first mass-market personal computer operable through a GUI, although it eventually fell victim to the more affordable and simplified Apple Macintosh. Microsoft entered the scene with Windows 1.0 in 1985, marking its entry into the GUI space.

By the 2000s, GUIs had become integral to various devices, from personal computers to smartphones. It's been a bit of a blur in terms of the constantly changing interfaces we adapt to across the devices we use daily, but I'm here for it.

Because we're not seeing subtle iterations in our most-used interfaces, we tend to underestimate the growing complexity of them. Look at some of the most basic hardware we use to get a sense of how fast things change.

  1. Laptops and Trackpads: The introduction of laptops brought computing into a portable realm. But it wasn't just about mobility; the way users interacted with these devices also changed. Trackpads, for instance, became a key component of laptop UIs. Unlike the mouse, trackpads offered a more integrated and compact way to interact with the GUI, unlocking gestures like tapping, swiping, and pinching.

  2. Touchscreens and Tablets: The emergence of touchscreens and tablets marked a significant leap in UI complexity. Touchscreens eliminated the need for peripheral input devices altogether, allowing users to interact directly with the display. This shift paved the way for multi-touch gestures, which brought a new level of intuitiveness and engagement to UIs. Pinching to zoom, swiping to navigate, and tapping to select became commonplace interactions.

  1. Smartphones and Wearables: Smartphones have arguably been the most influential in shaping modern UI complexity. With limited screen real estate, UI design had to become more innovative and efficient. The development of UIs for smartphones led to the creation of new design paradigms, such as the hamburger menu and home screen widgets. Moreover, the integration of sensors like gyroscopes and accelerometers allowed for context-aware UIs that respond to the device's orientation and movement​​. Wearables like smartwatches further push the boundaries of UI complexity. Given their small screens, UIs on these devices must convey information succinctly and allow for easy interaction, often through a combination of touch and physical buttons — also leading to increased adoption of voice interaction, but we'll get to that later.

And Then Came Voices

Conversational interfaces, including voice assistants like Siri, Alexa, and Google Assistant, have become integral to our daily lives. The evolution of these technologies, from Siri's introduction in 2011, has only accelerated with the emergence of GenAI. We now have rapid growth in the text-to-speech and speech-to-text arena, with companies like DeepGram and Elevenlabs bringing voice interactions to LLMs. We are also seeing a drive to create more realistic and human capabilities in products like Hume and voice-first creator tools emerging like Play.ai.

Note: When I started writing this post a few weeks ago, Amazon's BASE model was probably my top example of a model being able to generate a voice with real human inflection and emotion. Unfortunately, that was mostly tied to and trained on audiobooks and very dependent on narrative writing with descriptions like, “he said with excitement.” The other examples I mentioned above have all released major updates like voice profiles in Elevenlabs that are connected to specific emotions or Hume’s transparent sentiment indicator/analysis. We’re still not quite to the point where the voice can change dynamically based on the conversation — mostly because we’re still processing text.

I expect that in the next 3-6 months, we will see the emergence of a voice-first model or the introduction of more metadata in LLM responses to inform how the audio should be generated. (If it hasn’t happened by the time I send this)

On a personal note, my wife’s name sounds a lot like “Siri” so I haven’t been a big adopter of voice command-driven automation on mobile I do enjoy my Google Home when working out or to help answer my kid’s 10,000 questions a day, and I’ve been working on a speech to speech platform for sales training and practice called Luster, that officially came out of Stealth Mode this week!

General Awareness and Adoption: A substantial majority of consumers are familiar with voice-enabled products and devices, with only about 10% of respondents in a survey being unfamiliar with them. In the US, 62% of adults use a voice assistant on some device with over half of those people (52%) using voice commands daily.

These voice assistants offer an increasingly human-like interaction, understanding context and providing personalized responses (when they actually work). They're integrated into various devices like smartphones, smart speakers, and even smart displays, making them accessible to a broad user base. Their capabilities have expanded beyond simple tasks like playing music or providing weather updates to managing smart homes, making phone calls, sending messages, and more. As these assistants receive regular updates, they become smarter, more intuitive, and further integrated into our daily routines. I have high hopes and high skepticism for devices like the R1, that promise to be a true companion… assistant?

The rise of voice assistants has also spurred innovation in the tech industry, leading to rapid advancements in voice recognition technology and natural language processing. I'm not sure if the technology is advancing as quickly as our user's expectations, but that's just my speculation.

Hard of Hearing?

While there’s a lot to be excited about with voice user interfaces (VUIs), there are a number of challenges that are actively being solved to ensure the quality of the experience is where it needs to be to secure long term adoption.

Challenges with VUI

  • Noise: One of the foremost challenges in VUI technology is effectively functioning through background noise. This issue became especially prominent during the pandemic, with the increased use of audio and video conferencing tools. This will only become a bigger problem as more and more people rely on voice interactions for more tasks at a higher frequency. Imagine walking down the sidewalk, trying to give your assistant instructions while 8 other people around you are doing the exact same thing. It goes beyond recognizing the voice and will require more accurate identification and isolation, spatial awareness, and major improvements to accuracy. We’ll hopefully also see voice-native models emerge so we don’t have to do silly things like tell an LLM to ignore spelling when it’s wired up for voice-to-voice. (Seriously kept telling me I was pronouncing its name wrong… 😠 

  • Accuracy: Despite the surge in adoption of voice assistants, user frustration is still a thing. Users often express dissatisfaction with the understanding, reliability, and accuracy of digital assistants. For instance, a study highlighted that 62% of users were frustrated with the lack of understanding by voice assistants. This frustration is further amplified in the case of children, whose speech patterns and language structures differ significantly from adults.

Speech is REALLY complex

I have a new favorite trial project for potential new hires on my team. It’s asking them to write requirements for creating a human to human interaction. It might be a game, a conversation, or something else, but it’s an incredible test of their ability to empathize and think deeply about what impacts the end user’s perception of what feels “human” without crash landing in uncanny valley territory.

My brain almost exploded when I went through this exercise myself, here’s why:

  • How someone says something makes a big difference in how it is interpreted.

  • A person’s body language also adds complexity to the meaning of what they are saying, but it can also be distracting or considered rude when listening.

  • A change in tone is an important signal for human interactions, determining whether it’s a good or bad signal is a whole different level of logic and processing.

  • People communicate quickly, but not so quickly that they can’t process what the other person is saying or consider what they want to say next.

  • Mirroring someone else’s emotions isn’t always a good idea. Sometimes, you want to counter their mood/emotion, sometimes, you want to match it, and sometimes, it’s better not to address it at all.

I could go on and on, but I think you get it.

What’s next?

Imagine controlling your environment with just a wave of your hand or a flick of your finger. Gesture-based interfaces leverage the body's natural movements to facilitate seamless interactions with digital systems without the need for traditional input devices like keyboards or mice. This tech uses sophisticated sensors and instant processing to read and translate physical gestures into commands. For a lot of people, this sounds like the distant future, but many of you may know…”The future is now”

  • These Meta Raybans have support for gesture controls

  • This Tap 2.0 lets you connect to a variety of devices via Bluetooth to enable hand gesture controls.

  • Apple Vision Pro supports both hand and eye gestures.

  • Neuralink is now recruiting for clinical trials to let people control computers with mind bullets! I mean… their minds

While not all of us are thinking about or building for these types of interactions, it’s important to start thinking about them. Once users find something they like, it becomes a new standard and expectation for all interactions moving forward.

Think about the original iPod and how painful manually scrolling down a long list of songs felt after you experienced the fast and smooth circular scrolling.

Think about how your expectations for search engines have changed over the years. At one point, we knew we would have to dig through the results or adjust our search to find the right information. Now we’re pissed if it’s not in the top 3 results, or if the browser didn’t do all of the work for us.

What did you think of this post?

I’m doing a bit of experimentation out of the gate to see what people are into, where I need to go deeper, and where I might need to step back. It would be a tremendous help if you could let me know what you’re looking for.

What kind of content are you looking for?

Login or Subscribe to participate in polls.