Trending Topics in Data Science in 2021

Source: Unsplash

As the data science sphere continues to grow, so do the branches and technologies within it. Especially given the context of the pandemic, the area is seeing an explosion of new opportunities (and threats) that are worth analyzing. Let’s look at five important topics in the field:

1. Adversarial Machine Learning

Adversarial Machine learning refers to a technique that attempts to trick a machine learning algorithm by providing deceptive inputs. In image classification for example, this works by tweaking the pixel values in a way that a machine may pick up but would not be discernable to the human eye. While the image remains the same for humans, a machine learning or deep learning algorithm will misclassify it. Although some of these examples can be funny or harmless (mistaking a panda for a gibbon), others can be dangerous (mistaking a stop sign for a speed limit sign). In order to combat this, machine learning engineers are having to retrain their models against potential adversarial examples so that when faced with such inputs in a real-world context, the algorithm will still classify correctly.

2. Accessibility

One of the fastest growing trends in data science is making the field more accessible. One example of this is the rise in Kaggle competitions, which take a dataset and task competitors with building machine and deep learning models over it. The winning projects have been the basis for academic papers and the winners have gone on to receive employment at top tech firms. Anyone can participate in the competitions, challenging people from all different backgrounds to take part. The site hosts various data science related projects and datasets that can be accessed for free.

3. End-to-End Solutions

Companies are looking for more vertical solutions regarding their data science initiatives in order to streamline the entire process into one. This is evidenced by the rise in “DataOps”, which works similarly to DevOps in that it seeks to facilitate the end-to-end flow of data throughout an organization. Companies who can provide products or services that reflect the DataOps methodology have seen great success. Whereas before clients had to collaborate with various teams throughout the data processing cycle, now there are platforms that allow for a more integrated approach. The company Dataiku created a software that helps their clients throughout the entire data cycle, starting with cleaning and processing, all the way to building predictive models. The company is now worth 1.4 Billion and has clients such as Unilever, GE and FOX news group.

4. Hybrid Cloud

Many companies are seeking hybrid measures for their data storage, that involve storing some information in a public cloud but other data on site or in a private cloud. This provides a client with greater security over sensitive data and more flexibility to migrate data between clouds. Cloud providers have been adapting by offering hybrid solutions such as “cloud on premises” and APIs that can make connecting a public and private cloud easier. The research firm Mordor Intelligence expects the hybrid cloud market to grow from 52 billion in 2020 to 145 billion by 2026.

5. Edge Computing

The use of edge computing is also trending recently. Edge computing involves a distributed computing structure that processes data closer to the location where it’s collected to save on bandwidth and response times. With the introduction of IoT and sensors, low latency is critical for communication between devices. One of the best examples of this is in the autonomous vehicles sectors, where edge computing allows sensors to quickly process data in real time. Other use cases include remote monitoring of oil and gas to prevent failures and in-hospital patient monitoring.





Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Generalized Wilson model

Isolation Forest is the best Anomaly Detection Algorithm for Big Data Right Now

The man who thinks he can and the man who thinks he can’t are both right.

Concentration Analysis between EPA’s Continuous and Gravimetric PM2.5 Monitors

Data Warehouse Modeling: Is Kimball Still Relevant?

How I Engineered My Grab Rides Data

Creating Waveforms Out of Spotify Tracks

Showing Buenos Aires properties in a Map with Plotly

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sarah Wright

Sarah Wright

More from Medium

How to Gain Weight in an Efficient and Healthier Way

Everything in, Everything out — The story of our modern data stack

8 tips From-


What are Different Types Of Descriptive Statistics