The Integrity Of Artificial Intelligence Is Under Threat

Witholding code hinders research and progress

Artificial Intelligence is growing up fast. Although modern computers were only invented in the mid 20th century, they have already evolved into the complex machines we rely on today. Artificial Intelligence now governs a large proportion of consumer and business behaviour: from the way we use the internet, manufacture goods, and even hire and fire our workforces.

However, as with any technology, when things grow too quickly, problems can arise. Artificial Intelligence as a scientific discipline might be struggling to keep up with the pace of change. A tendency to withhold computer code from published papers on AI has led to replicability issues with experiments. If left unchecked, this could threaten the very integrity of AI research itself.

Replicability problems

The reliability of scientific discovery depends on the scrutiny of the global scientific community. When researchers conduct experiments, they must report their actions in detail so that any other scientist, anywhere in the world, can replicate the process and test the value of their results. This open and critical system of peer review is the fundamental basis of modern science. If results cannot be reproduced, then flaws are exposed, and theories and conclusions can be challenged. It’s all in the name of the advancement of science.

With AI, the reproducibility of experiments has proven to be a serious issue. When computers are designed for machine learning, there is randomness from one test to another. Algorithms learning by trial and error will not do things in the same way each time, making it difficult to keep track of their progress. What’s more, algorithms are sensitive to the environments in which experiments are conducted. Environmental factors can – and will – influence an algorithm’s performance, as will training data supplied to get the algorithm ready for the experiment. If precise details of these influences aren’t recounted, then reproducing experiments and results becomes a near impossible feat.

Being coy about the code

Experiential issues aside, there is another reason why AI faces a replicability crisis. Even in respected scientific journals, AI papers are often published without their source code. Researchers are reluctant to share details of code and data that algorithms are tested on, making it difficult for their results to be checked by outsiders. Whilst the motivations for keeping code under wraps are clear – it might be owned by a company, for example – this practice threatens the status of AI as a scientific endeavour.

In fact, AI researchers as a whole often lack incentive to focus on reproducibility in their studies. Under pressure to publish their papers quickly, they don’t have enough time to test algorithms under every possible condition, or enough space to write about all the parameters they apply to an experiment. Furthermore, researchers are often reluctant to report replicability issues for fear of criticising their seniors. In the world of science, prioritising your concern for someone’s feelings over the accuracy of science is a big faux pas. AI researchers please take note.

Moving forward

If AI wants to earn our respect, then its researchers are going to have to up their game in order to maintain its integrity. The way forward has been shown by the likes of IBM, who have created a neural network capable of automatically recreating unpublished source code. By scanning AI research papers, IBM’s deep learning model is able to capture data and turn it into code. This can then be made publicly available, improving access to AI code for developers. In a similar vein, Joaquin Vanschoren of Eindhoven University of Technology in the Netherlands established OpenML to increase the availability of algorithms and data sets. OpenML’s mission to democratise machine learning will have far reaching effects, by making access to AI free and transparent for all. As we increasingly structure our world around AI, this kind of ethics will become more and more important in years to come.

Does the practice of withholding code really threaten the integrity of AI? If your business developed an algorithm, would you want to share it? Should the democratisation of AI be a pressing concern for us all? Please share your thoughts.