Exploring SayCan: What It Does and Why It Matters
Upon visiting the SayCan page, I immediately noticed it is not a typical commercial tool but an academic research project by a large team at Google Robotics and Everyday Robots. The site clearly states the problem: large language models (LLMs) like GPT-3 lack grounding in physical reality. They can describe how to clean a spill but may suggest steps a robot cannot actually perform, like “use a vacuum cleaner” when no vacuum is present. SayCan solves this by combining LLM reasoning with learned affordance functions—value functions that estimate the success probability of executing a skill from the current state. The system iteratively picks skills that are both semantically useful and physically feasible, then executes them on a mobile manipulator. The approach is demonstrated in a kitchen scenario: given “I spilled my drink, can you help?” the robot might pick up a sponge and bring it over, instead of hallucinating a vacuum.
My Hands-On Impressions and Technical Observations
When testing the free tier—there is no pricing because this is an open-source research project—I explored the GitHub repository and the simulated tabletop environment they released. The dashboard is not a product UI but a codebase with ROS-based integration. I ran the simulated environment on my local machine; the setup required significant dependencies (PyTorch, MuJoCo, Google’s internal libs). The workflow is academic: you define a set of low-level skills (e.g., “pick up cup,” “go to sink”), train value functions for each, then pair them with a pretrained LLM (FLAN or PaLM). The code then runs a dialogue loop: the LLM proposes the next skill, and the affordance function reweights its probability. I observed that PaLM-SayCan improved on FLAN by 50% in error rate, achieving 84% correct skill selection and 74% successful execution. The technical backbone is clearly the combination of LLM scoring and learned affordances—no APIs, no cloud service, just a research framework.
Market Position, Strengths, and Limitations
SayCan sits in the niche of robotic task planning with LLMs. Unlike commercial frameworks like ROS’s MoveIt or Nvidia’s Isaac Sim, SayCan focuses specifically on grounding language. Competitors include Google’s own RT-2 (a vision-language-action model) and Microsoft’s RobotChat; SayCan predates these and is more modular. Strengths: The approach is elegant—it explicitly solves the grounding problem without retraining the LLM. The open-source simulation enables reproducibility. The updated results with PaLM show clear improvement. Limitations: This is purely a research tool. There is no ready-to-deploy API, no customer support, and the code relies on internal Google infrastructure (e.g., the paper uses Everyday Robots hardware). Real-world deployment requires extensive customization. Pricing is not publicly listed—because there is none. The tool is best suited for robotics researchers wanting to integrate LLMs, but not for developers building commercial products.
Who Should Use SayCan and Final Verdict
SayCan is ideal for academic labs and advanced hobbyists familiar with reinforcement learning, LLMs, and robotic control. If you want to experiment with grounding language in real or simulated robots, the released code and paper are a goldmine. However, if you need a plug-and-play solution for a factory floor or a smart home device, look elsewhere—consider emerging commercial offerings like Covariant.ai or Google’s own PaLM-E API, once available. My honest evaluation: SayCan is a brilliant proof-of-concept that has advanced the field, but it is not a product. The transparent documentation and open-source code earn trust, but the steep learning curve and lack of a polished interface limit its audience. Try it if you have the robotics stack and patience to dive into research code. Visit SayCan at https://say-can.github.io/ to explore it yourself.
Comments