GitHub announced this week that it will begin using interaction data from Copilot users to train and improve its artificial intelligence models, marking a significant shift in the platform's data practices that has drawn immediate pushback from the developer community.

The new policy, announced on March 25 by GitHub Chief Product Officer Mario Rodriguez, takes effect on April 24. From that date, inputs, outputs, code snippets, and associated context from Copilot Free, Pro, and Pro+ users will feed directly into model training pipelines — unless those users manually navigate to their settings and disable the option. Copilot Business and Enterprise customers, along with students and teachers, are exempt under the terms of their existing contracts.

The scope of data collection is broad. GitHub's updated policy covers prompts sent to Copilot, generated suggestions that users accept or modify, surrounding code context near the cursor position, comments and documentation, file names, repository structure, navigation patterns, and even thumbs-up and thumbs-down feedback on suggestions. In practical terms, nearly every interaction a developer has with Copilot becomes potential training material.

Perhaps most contentious is the policy's relationship with private repositories. While GitHub states that content from private repositories "at rest" is not used for training, the company acknowledges that Copilot processes code from private repositories during active use — and that interaction data from those sessions can be collected for model training unless the user has opted out. The distinction between data "at rest" and data "in use" has struck many developers as a semantic manoeuvre that effectively erodes the meaning of "private."

Rodriguez justified the change by pointing to internal results. GitHub has already been incorporating interaction data from Microsoft employees into its training pipeline and claims to have seen "meaningful improvements," including higher acceptance rates for AI-generated code suggestions across multiple programming languages. The company argues that expanding this approach to the broader developer base will improve model performance for everyone.

The reaction from GitHub's own community has been decidedly cool. In the official discussion thread accompanying the announcement, users had registered 59 thumbs-down votes against just three positive reactions at the time of publication. Among dozens of comments, no community member other than GitHub's own VP of Developer Relations voiced support for the change.

GitHub framed the policy as consistent with "established industry practices," citing similar opt-out models at Anthropic, JetBrains, and parent company Microsoft. But critics note that "established industry practices" in this context means adhering to American norms rather than European ones, where opt-in consent is typically required under GDPR. The opt-out approach places the burden on individual developers to protect their data rather than requiring the company to seek permission first.

The move also raises questions about the evolving economics of AI-assisted development tools. GitHub currently offers Copilot free to millions of developers, a strategy that has been instrumental in building market dominance. Under the new policy, those free-tier users effectively pay for the service with their interaction data — a familiar bargain in the technology industry, but one that takes on new dimensions when the "data" in question is proprietary source code.

Data collected under the programme may be shared with GitHub's corporate affiliates, including Microsoft, though the company states it will not be shared with third-party AI model providers. For developers working on sensitive or proprietary projects who wish to opt out, the setting can be found under the Privacy section of Copilot account settings.

The policy change arrives at a moment when the AI industry's appetite for training data continues to outpace the supply of freely available material. Having already built Copilot's foundation on publicly available code from GitHub repositories — a practice that itself generated legal controversy — the company is now reaching deeper into its relationship with developers to fuel the next generation of models. Whether that relationship can sustain the weight of default data collection remains an open question.

// LATEST INTELLIGENCE