The era of open voice assistants
We need more projects like home assistant. I started using it recently and was amazed. They sell their own hardware but the whole setup is designed to works on any other hardware. There are detailed docs for installation on your own hardware. And, it works amazingly well.
Same for their voice assistant. You can but their hardware and get started right away or you can place your own mics and speakers around home and it will still work. You can but your own beefy hardware and run your own LLM.
The possibilities with home assistant are endless. Thanks to this community for breaking the barriers created by big tech
It's too bad it's sold out everywhere. I've tried the ESP32 projects (little cube guy) for voice assistants in HA but it's mic/speaker weren't good enough. When it did hear me (and I heard it) it did an amazing job. For the first time I talked to a voice assistant that understood "Turn off office lights" to mean "Turn off all the lights in the office" without me giving it any special grouping (like I have to do in Alexa and then it randomly breaks). It handled a ton of requests that are easy for any human but Alexa/Siri trip up on.
I cannot wait to buy 5 or more of these to replace Alexa. HA is the brain of my house and up till now Alexa provided the best hardware to interact with HA (IMHO) but I'd love something first-party.
If it's possible for the hardware to facilitate a use case, the employees working on the product will try to push the limits as far as they possibly can in order to manufacture interesting and challenging problems that will get them higher performance ratings and promotions. They will rationalize away privacy violations by appealing to their "good intentions" and their amazing ability to protect information from nefarious actors. In their minds they are working for "the good guys" who will surely "do the right thing."
At various times in the past, the teams involved in such projects have at least prototyped extremely invasive features with those in-home devices. For example, one engineer I've visited with from a well-known in-home device manufacturer worked on classifiers that could distinguish between two people having sex and one person attacking another in audio captured passively by the microphones.
As the corporate culture and leadership shifts over time I have marginal confidence that these prototypes will perpetually remain undeveloped or on-device only. Apple, for instance, has decided to send a significant amount of personal data to their "Private Cloud" and is taking the tactic of opening "enough" if its infrastructure for third-party audit to make an argument that the data they collect will only be used in a way that the user is aware and approves of. Maybe Apple can get something like that to a good enough state, at least for a time. However, they're inevitably normalizing the practice. I wonder how many competitors will be as equally disciplined in their implementations.
So my takeaway is this: If there exists a pathway between a microphone and the Internet that you are not in 100% control over, it's not at all unreasonable to expect that anything and everything that microphone picks up at any time will be captured and stored by someone else. What happens with that audio will -- in general -- be kept out of your knowledge and control so long as there is insufficient regulatory oversight.
That's a pretty timely release considering Alexa and the Google assistant devices seem to have plateaued or are on the decline.
Had to laugh a bit at the caveat about powerful hardware. Was bracing myself for GPU and then it says N100 lol
One thing that makes me nervous: Home Assistant has an extremely weak security model. There is recent support for admin users, and that’s about it. I’m sort of okay with the users on an installation having effectively unrestricted access to all entities and actions. I’m much less okay with an LLM having this sort of access.
An actually good product in this space IMO needs to be able to define specific sets of actions and allow agents to perform only the permitted actions.
I don't fully understand the cloud upsell. I have a beefy GPU. I would like to run the "more advanced" models locally.
By "I don't fully understand," I mean just that. There's a lot of marketing copy, but there's a lot I'd like to understand better before plopping down $$$ for a unit. The answers might be reasonable.
Ideally, I'd be able to experiment with a headset first, and if it works well, upgrade to the $59 unit.
I'd love to just have a README, with a getting started tutorial, play, and then upgrade if it does what I want.
Again: None of this is a complaint. I assume much of this is coming once we're past preview addition, or is perhaps there and my search skills are failing me.
I wonder how this compares to the Respeaker 2 https://wiki.seeedstudio.com/ReSpeaker_Mic_Array_v2.0/
The respeaker has 4 mics and can easily cancel out the noise introduced by a custom external speaker
I had great trouble simply connecting Bluetooth speaker to use it as voice input and for sound output. The overall state of sound subsystem for diy voice assistant feels third-class at best.
Looks great! The biggest issue I see is music. 90% of my use is "play some music" but none of the major streaming music providers offer APIs for obvious reasons. I'm not sure how you can get around that really.
My wife and I have been very happy with Home Assistant so far. The one thing we're missing is voice control, and until now it seemed like there just wasn't a clean solution for HA voice control. You were stuck doing some hobbyist shenanigans and hand-writing boatloads of YAML, or you were hooking up a HomeKit/Alexa which defeats the purpose of HA. This is a game-changer.
They recommend an N100 in the blog post, but I might buy one anyway to see if my HA box's Celeron J3455 will do the job.
As someone not that familiar with haas, can someone explain why there's not a clear path to replace Alexa or Google home? I considered using haas recently to get a gpt like response after being frustrated with Google home, but it seems this is a complete mess. is there a way to get this yet?
Are there any MacOS software versions of this? I've been looking for opensource wake-work for a "Hey Siri"-like integration, but I'm very apprehensive of anything, malicious or not, monitoring the sound input for a specific word in an efficient way.
If it runs fully on premise that would be great. Im still not comfortable buying a device that records everything I say and uploads it to a cloud
My plea / request : Make a home assistant a DROP IN replacement for a standard light switch. It has power, its adds functionality from the get-go (smart lighting), it’s placed in a convenient position for the room and no extra wires etc required.
Is anyone aware of an effort to repurpose Echo hardware to do HA voice?
Nice. A totally local voice assistant.
This makes sense for cars, where there's much local stuff to control. But for a home unit, what do you want to do that is entirely local? Turning the heat up and down gets boring after a while. If it does entertainment selection or shopping, it needs outside world connections.
(Today's rant: I recently purchased a humidifier. It's just a little unit with a water tank, a water-softening filter, and an ultrasonic vaporizer. That part works fine. Then there are the controls.
All this thing really needs is an on-off switch and a humidity knob, and maybe lights for power, humidification, and water tank empty. But no. It has five touch buttons and a round display about four inches across. The display is on even if the unit is off. Pressing the on/off button turns it on. If it's humidifying, there's a whole light show. The tank lights up purple. Swooping arcs of blue run up both edges of the round display. It's very impressive, especially in a dark bedroom. If you press and hold the second button for two seconds, about half the light show is suppressed.
There are three fan speeds, and a button for that. Only the highest one will propel the water vapor high enough to avoid it hitting the floor and uselessly condensing before it mixes with the air. So that feature was not necessary.
The display shows one number. It's usually the current humidity, but if you press the humidity set button, the number displayed becomes the setting, which is changed upwards by successive presses until it wraps around. After a few seconds, the display reverts to current humidity.
Turning the unit off or removing the water tank resets all settings to the default.
This is the low-end unit. The next step up comes with an IR remote. It's one way - the remote has buttons but no display. Since you have to be close to the display to use the buttons effectively, that doesn't help much. The step up after that is, inevitably, a cloud-based phone app.
So this thing could potentially be interfaced to a voice assistant. That's only useful if there's enough information coming back from the device that the assistant software knows what the device is doing, and the assistant software understands that device status. If all it does is send remote button pushes, the result will be frustration.
So you need some degree of intelligence at both ends - the end that talks to the human, and the end that talks to the device. If the user says "House, it's too dry in here", the assistant system needs to be able to check the status of the humidifier. Has power? Talking? On? Humidity setting reasonable? Fan running? Tank not empty? If it can't do that, it's part of the problem, not part of the solution.)
My experience with home assistance voice pipeline is nothing works and stt is terrible. I'll have to wait and see the reviews.
Genuine question - How hackable is this? Can I have the voice commands redirected to my backend server where I can process it as I please?
What I don't like is that most voice assistances perform really bad on my native language so I don't use them at all. For english speakers yes, but for all other not so much. I guess it will get better.
Though a separate hardware helps - I believe voice and automation can be integrated more seamlessly to our existing devices (phones/laptops) with high compute built in.
Llama and whisper are already public so that should help innovation in this area.
I am very excited for this. One question I couldn’t find an answer for though is whether the hardware is open enough to be usable with other home automation systems. I am using OpenHAB and they too have an integrated voice assistant. I looked into migrating to HA a couple times but eventually gave up, primarily because it felt like such a waste of time to migrate a fully working environment with dozens of rules and scripts to yaml files.
A good emphasis in the summary, that certain other companies will only focus on monetization at the expense of features and functionality.
Open as in 3d print files, rpi etc.? If so this is the project I am looking for!
Here's what I'm looking for in a voice assistant:
- Full privacy: nothing goes to the "cloud"
- Non-shitty microphones and processing: i want to be able to be heard without having to yell, repeat, or correct
- No wake words: it should listen to everything, process it, and understand when it's being addressed. Since everything is private and local, this is now doable
- Conversational: it should understand when I finished talking, have ability to be interrupted, all with low latency
- Non-stupid: it's 2024, and alexa and siri and google are somehow absolutely abysmal at doing even the basics
- Complete: i don't want to use an app to get stuff configured. I want everything to be controlled via voice
While we are getting shoveled AI keyword everywhere, I'm actually disappointed I don't see it here.
The first thought I had when encountering LLM was that it can finally make these devices understand you and make them finally useful... and I don't need to know some presceipted keywords.
Home Assistant is such a fantastic project. I've been waiting for something like this for a long time; I just pre-ordered three.
My only remaining wish is that I can replace Siri with this (without needing some workaround)
All I want is a voice assistant that I can call "computer" like Star Trek, I don't want to have to say a brand name thankyou!
And on back order everywhere. I just spent the last 2 weeks getting a esp32-s3-box setup to do this but its lack of audio out really irks me.
anyone tried https://getleon.ai/ ?
Perfect will dig more into it. Currently i like to have an spotify client without ui for my kids ;)
sorry if this question takes away from the great strives the team went through but wouldn't it be much easier (hardware wise) to jailbreak one of the existing great hardware thingies like Apple HomePod or the Google one or Alexa?
It's not clear to me from the description if this is also completely open source hardware. Are the schematics, BoM, firmware published under a permissible license? If so, where are they accessible?
And if not, I would be curious to know why it haven't been fully open sourced.
Can someone describe the use case here? I don't quite understand what its purpose is.
Is this a fully-private, open source alternative to Alexa, that by definition requires a CPU locally to run ?
Is the device supposed to be the nerve center of IoT devices ?
Can it access the Wifi to do web crawls on command (music, google, etc)?
You should talk to Sonos about partnering with them. They currently have a very limited Sonos voice assist, plus Google Voice and Alexa, but the latter two are limited pre-LLM assistants.
I’m assuming they eventually want to create their own LLM and something privacy focused would be good match for their customers. I don’t know how they feel about open source though
RIP Mycroft. A tad too early.
I think in some ways it could redefine how we think about voice control... taking it from the cloud and putting it back into users' hands, like literally
What voices do they use?
Well shoot. Now i want to record everything in my house and transcribe it for logs. I already wanted to do that but didn't think there was a sane way.. assuming this lets me create a custom pipeline, that's wicked
It isn't even one year since the press stories about how dumb a product Alexa was and how it makes no money and all the devs are getting laid off. Something changed now?
Not super convinced the XMOS audio processing chip is really gonna buy a lot. Trying to do audio input processing feels like a dynamic task, requiring such adaption. XMOS is the most well known audio processor and a beast, but not sure it's really gonna help here!
I really hope we see some open-source machine -learned systems emerge.
I saw Insta360 announce their video conferencing solution today. Optics looks pretty medium, nothing wild, but Insta360 is so good at video that I expect it'll be great. But there's a huge 14 microphone array on it, and that's the hard job; figuring out how to get good audio from speakers in a variety of locations around a room. It really made me wish for more open source footing here, some promising start, be it the conference room or open living space. I've given all of 60s to look through this, and was kinda hopeful because heck yeah Home Assistant, but my initial read isn't super promising, isn't that this is starting the proper software base needed to listen well to the world.
https://petapixel.com/2024/12/17/the-insta360-connect-is-a-2...
how does this compare to ESP32-S3-BOX-3B ?
What is a good GPU to put in a home server that can run the TTS / STT and the local LLM required to make this shine?
A 3090 is too expensive and power hungry. Maybe a 3060 12Gb? Is there anything in the "workstation" lineup that is more efficient especially since I don't need the video outs?
Majel Barrett voice please.
[dead]
[dead]
[dead]
i don't wanna talk to a computer
I'm actually really excited for this!
I noticed recently there weren't any good open source hardware projects for voice assistants with a focus on privacy. There's another project I've been thinking about where I think the privacy aspect is Important, and figuring out a good hardware stack has been a Process. The project I want to work on isn't exactly a voice assistant, but same ultimate hardware requirements
Something I'm kinda curious about: it sounds like they're planning on a sorta batch manufacturing by resellers type of model. Which I guess is pretty standard for hardware sales. But why not do a sorta "group buy" approach? I guess there's nothing stopping it from happening in conjunction
I've had an idea floating around for a site that enables group buys for open source hardware (or 3d printed items), that also acts like or integrates with github wrt forking/remixing