The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved
Curated by THEOUTPOST
On Tue, 3 Dec, 4:01 PM UTC
2 Sources
[1]
Hume launches Voice Control allowing users and developers to make custom AI voices
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Hume AI, the startup specializing in emotionally intelligent voice interfaces, has launched Voice Control, an experimental feature that empowers developers and users to create custom AI voices through precise modulation of vocal characteristics -- no coding, AI prompt engineering, or sound design skills required. This release builds on the foundation laid by the company's earlier Empathic Voice Interface 2 (EVI 2), which introduced advanced capabilities in naturalness, emotional responsiveness, and customization. Both EVI 2 and Voice Control avoid the risks of voice cloning, a practice that Cowen has stated carries ethical and practical challenges. Instead, Hume focuses on providing tools for creating unique, expressive voices that align with user needs, such as customer service chatbots, digital assistants, tutors, guides, or accessibility features. Moving beyond preset AI voices toward custom bespoke solutions Voice Control offers developers the ability to adjust voices along 10 distinct dimensions, including: "Masculine/Feminine: The vocalization of gender, ranging between more masculine and more feminine. Confidence: The assuredness of the voice, ranging between shy and confident. Enthusiasm: The excitement within the voice, ranging between calm and enthusiastic. Nasality: The openness of the voice, ranging between clear and nasal. Relaxedness: The stress within the voice, ranging between tense and relaxed. Smoothness: The texture of the voice, ranging between smooth and staccato. Tepidity: The liveliness behind the voice, ranging between tepid and vigorous. Tightness: The containment of the voice, ranging between tight and breathy." This no-code tool allows users to fine-tune voice attributes in real time through virtual onscreen sliders. It's currently available in Hume's virtual playground, which requires a free user sign-up to access. The release addresses a key pain point in the AI industry: the reliance on preset voices, which often fail to meet the specific needs of brands or applications, or the risks associated with voice cloning. This focus on customization aligns with Hume's broader goal of developing emotionally nuanced voice AI. The company's efforts to advance voice AI were highlighted in September 2024 with the launch of EVI 2, which the company described as a significant upgrade to its predecessor. EVI 2 improved latency by 40%, reduced costs by 30%, and expanded voice modulation features, offering developers a safer alternative to voice cloning. Sliders > text prompts Hume's research-driven approach plays a central role in its product development. The company, co-founded by former Google DeepMinder Alan Cowen, utilizes a proprietary model based on cross-cultural voice recordings paired with emotional survey data. This methodology, rooted in emotion science, forms the backbone of both EVI 2 and the newly launched Voice Control. Voice Control extends these principles by addressing the granular, often ineffable ways humans perceive voices. The tool's slider-based interface reflects common perceptual qualities of voice, such as buoyancy or assertiveness, without attempting to oversimplify these attributes through text-based prompts. Developer tools Voice Control is immediately available in beta and integrates with Hume's Empathic Voice Interface (EVI), making it accessible for a wide range of applications. Developers can select a base voice, adjust its characteristics, and preview the results in real time. This process ensures reproducibility and stability across sessions, key features for real-time applications like customer service bots or virtual assistants. EVI 2's influence is evident in Voice Control's capabilities. The earlier model introduced features like in-conversation prompts and multilingual capabilities, which have broadened the scope of voice AI applications. For example, EVI 2 supports sub-second response times, enabling natural and immediate conversations. It also allows dynamic adjustments to speaking style during interactions, making it a versatile tool for businesses. Differentiating in a competitive market Hume's focus on voice customization and emotional intelligence positions it as a strong competitor in the voice AI space, even against well-funded rivals such as OpenAI with its Advanced Voice Mode and ElevenLabs, both of which offer libraries of pre-set voices. Hume continues to build on its innovative approach to voice AI. Plans for expanding Voice Control include introducing additional modifiable dimensions, refining voice quality under extreme adjustments, and increasing the range of base voices available. With the launch of Voice Control, Hume reinforces its position as a leader in voice AI innovation, offering tools that prioritize customization, emotional intelligence, and real-time adaptability. Developers can access Voice Control today via Hume's platform, marking another step forward in the evolution of AI-driven voice solutions.
[2]
This AI Tool Will Let You Customise Voices for AI Systems
Hume, a New York-based artificial intelligence (AI) firm, unveiled a new tool on Monday that will allow users to customise AI voices. Dubbed Voice Control, the new feature is aimed at helping developers integrate these voices into their chatbots and other AI-based applications. Instead of offering a large range of voices, the company offers granular control over 10 different dimensions of voices. By selecting the desired parameters in each of the dimensions, users can generate unique voices for their apps. The company detailed the new AI tool in a blog post. Hume stated that it is trying to solve the problem of enterprises finding the right AI voice to match their brand identity. With this feature, users can customise different aspects of the perception of voice and allow developers to create a more assertive, relaxed, or buoyant voice for AI-based applications. Hume's Voice Control is currently available in beta, but it can be accessed by anyone registered on the platform. Gadgets 360 staff members were able to access the tool and test the feature. There are 10 different dimensions developers can adjust including gender, assertiveness, buoyancy, confidence, enthusiasm, nasality, relaxedness, smoothness, tepidity, and tightness. Instead of adding a prompt-based customisation, the company has added a slider that goes from -100 to +100 for each of the metrics. The company stated that this approach was taken to eliminate the vagueness associated with the textual description of a voice and to offer granular control over the languages. In our testing, we found changing any of the ten dimensions makes an audible difference to the AI voice and the tool was able to disentangle the different dimensions correctly. The AI firm claimed that this was achieved by developing a new "unsupervised approach" which preserves most characteristics of each base voice when specific parameters are varied. Notably, Hume did not detail the source of the procured data. Notably, after creating an AI voice, developers will have to deploy it to the application by configuring its Empathic Voice Interface (EVI) AI model. While the company did not specify, the EVI-2 model was likely used for this experimental feature. In the future, Hume plans to expand the range of base voices, introduce additional interpretable dimensions, enhance the preservation of voice characteristics under extreme modifications, and develop advanced tools to analyse and visualise voice characteristics.
Share
Share
Copy Link
Hume AI introduces Voice Control, an innovative tool allowing users to create custom AI voices without coding, addressing the need for unique voice solutions in various applications.
Hume AI, a New York-based startup specializing in emotionally intelligent voice interfaces, has launched Voice Control, an experimental feature that allows users and developers to create custom AI voices without the need for coding, AI prompt engineering, or sound design skills 1. This innovative tool addresses a significant challenge in the AI industry: the reliance on preset voices that often fail to meet specific brand or application needs.
Voice Control offers a no-code solution for fine-tuning voice attributes in real-time through virtual onscreen sliders. Users can adjust voices along 10 distinct dimensions:
These dimensions allow for precise modulation of vocal characteristics, enabling the creation of unique, expressive voices tailored to specific needs such as customer service chatbots, digital assistants, tutors, guides, or accessibility features 2.
Hume's approach is rooted in emotion science and utilizes a proprietary model based on cross-cultural voice recordings paired with emotional survey data. The company has developed an "unsupervised approach" that preserves most characteristics of each base voice when specific parameters are varied, allowing for disentanglement of different voice dimensions [2].
Voice Control integrates with Hume's Empathic Voice Interface (EVI), likely using the EVI-2 model. This integration makes it accessible for a wide range of applications, allowing developers to select a base voice, adjust its characteristics, and preview results in real-time [1].
Hume's focus on voice customization and emotional intelligence positions it as a strong competitor in the voice AI space. Unlike companies such as OpenAI and ElevenLabs, which offer libraries of pre-set voices, Hume provides tools for creating unique, expressive voices that align with specific user needs [1].
Hume plans to expand Voice Control's capabilities by:
The launch of Voice Control represents a significant step forward in the evolution of AI-driven voice solutions. By prioritizing customization, emotional intelligence, and real-time adaptability, Hume is addressing key pain points in the industry and offering a safer alternative to voice cloning [1]. This development could potentially reshape how businesses and developers approach voice AI integration in their applications, leading to more personalized and brand-aligned voice interfaces across various sectors.
Reference
[2]
ChatGPT's new Advanced Voice Mode brings human-like speech to AI interactions, offering multilingual support, customization, and a wide range of applications from language tutoring to creative tasks.
2 Sources
OpenAI introduces a new voice-based interaction feature for ChatGPT, exclusively available to Plus subscribers. This advanced voice mode allows users to have spoken conversations with the AI, enhancing accessibility and user experience.
11 Sources
OpenAI brings ChatGPT's Advanced Voice Mode to Windows and Mac desktop apps, offering users a more natural and intuitive way to interact with AI through voice conversations.
6 Sources
OpenAI introduces the Realtime API, a groundbreaking technology that enables advanced voice interactions in smart devices, potentially transforming user experiences across various applications and industries.
2 Sources