An Intro into Ethical Issues of Data Science

Photo by rawpixel / Unsplash

Data is everywhere. Now more than ever, the internet has exploded with almost inconceivable amounts of data just waiting to be mined by the eager researcher. The industry is continually discovering new and innovative ways to apply this data from music recommendations to disease prediction. As all of these applications evolve, the time is ripe for a consideration of various ethical problems that are sometimes subtly presented not only to the researcher but also other interested parties. In this short discussion, four of the many ethical issues related to data science will be presented as based on the article presented by Michael Fuller this year entitled Big Data, Ethics and Religion: New Questions from a New Science.

First, any use of data requires assumed or explicit consent. So often, though, users see this consent in the form of a privacy policy or EULA that is not only incredibly long but also just as confusing to understand with the legal terminology and release of rights. These documents are far too often focused on the mitigation of liability rather than genuinely explaining what can or will be done with the user’s data. This ethical inconsistency of informing yet not actually informing the user could be solved with readable explanations as well as more granularity of consent rather than a single agreement to the entire lengthy document. Far too often, the disciplines of law and software become isolated from each other such that neither can coordinate enough to care for their customers.

Second, an ethical contention exists between data-based intelligence and individual privacy. As Fuller presents, aggregation and analysis of individual medical information can be extremely useful particularly for future research and development of new treatments. Additionally, an individual can be easily identifiable by only three traits: gender, zip code, and year of birth. Understanding this fine balance will be key to the necessary discussion and disclosure of what data can be used and how it may be applied. The solution here, unfortunately, will not be an easy one.

Third, ownership and rights to share data pose a significant ethical question. Is identifiable personal data always personal or can it always be shared as it is now by data brokers? Some countries have actually mandated web-based services to keep surveillance data on their users for government access. Like other dilemmas, these data “rights” must be responsibly disclosed and open for discussion. Individual privacy has to be balanced with security concerns particularly at the civil level.

Fourth and perhaps most subtle, research bias can creep in perhaps even more than in typical scientific journals. Who performs the data? What data is cleaned from the set? What metrics are used for analysis? Does the presentation of results accurately portray the information? As Fuller recognizes, bias, either intentional or not, can appear in data science at nearly every step of the process. Although bias will always be present, peer review and disclosure of methodologies as applied by other scientific disciplines can assist in keeping these decisions accountable and more ethically sound.

In short, data science is inseparable from ethics. The Bible says that money is not evil but rather the love of money is the root of all evil (1 Timothy 6:10). Likewise, data is not the problem; every concern revolves around how it is used. Open discussion and responsible transparency will be the major weapons in this battle of data control and regulation.

What are your concerns with Big Data? What do you think about cloud-based computing or data brokering? Add to the discussion by leaving a comment.

Author image

R. Christian Di Lorenzo

I'm a Data Science grad student and software engineer for Web and iOS. I craft unique and innovative software experiences to improve fellow humans' lives as I live for Christ.
  • North Carolina, U.S.