Companies Compete to Secure Real-World AI Training Data

Home services startup Pronto has begun piloting in-home video recordings to train physical AI systems, highlighting the rapid growth of the AI data capture and labeling industry. This sector, which remains loosely regulated, is crucial for the global robotics supply chain. Pronto is among several startups, including Human Archive, Humyn Labs, Egolab AI, and Neocambrian, that are collecting egocentric data through wearables and head-mounted cameras.
These companies collaborate with various sectors, such as cloud kitchens, hotels, and small factories, to document everyday tasks. Activities range from cooking and cleaning to garment stitching and inventory sorting. Some startups have even established dedicated ‘data factories’ equipped with motion-tracking technology.
Growing Demand for Data
Abhinav Kukreja, founder of Neocambrian AI, noted that their typical clients include robotics and vision-language-action model companies. He emphasized the lack of a comprehensive repository of physical behavior online, stating that robots need to learn from real-world environments like homes and factories. Kukreja believes this initiative could create additional income opportunities for workers and households, as they compensate both data collectors and the owners of the environments used for data capture.
This data is essential for training AI systems to operate in unstructured environments, with significant interest from the defense sector for autonomous drone applications. However, the practice raises concerns regarding privacy and legality, particularly when videos are recorded without consent or compensation. Reports indicate that some factories have halted these pilot programs following public backlash.
International Collaboration and Challenges
Manish Agarwal, co-founder of Humyn Labs, highlighted the increasing demand from robotics OEMs and software developers. His company converts collected data into episodic strings that enhance robot memory and capabilities. Agarwal pointed out that training data must reflect diverse environments, stating that their verified networks of workers span 16 countries to ensure comprehensive training for robots.
While proponents argue that this initiative positions India within the global AI value chain, skeptics view it as a cost-arbitrage strategy. Madhukar Yarra, CEO of NextWealth, described the data collection process as largely reliant on unorganized gig work, suggesting that the long-term viability of this model remains uncertain.
Observer Voice is the one stop site for National, International news, Sports, Editor’s Choice, Art/culture contents, Quotes and much more. We also cover historical contents. Historical contents includes World History, Indian History, and what happened today. The website also covers Entertainment across the India and World.