5 Reasons Why Containers Will Rule Data Science

5 Reasons Why Containers Will Rule Data Science

Historically, containers were a way to abstract a software stack away from the operating system. For data scientists, containers have historically offered few benefits.


Sponsored Post.

(Abstracted from this post on Gigantum)

A data scientist’s work is inexorably tied to data and their analysis tied to coding environments. We still disagree who should call themselves a data scientist, but one aspect that certainly differentiates data scientists from computer scientists is the need to have data closely tied to projects for the purposes of data manipulation and modeling.

Enter containers. Historically, containers were a way to abstract a software stack away from the operating system. For data scientists, containers have historically offered few benefits.

Image

Fast forward to 2020 and now the best data scientists in academia and industry are turning to containers to solve a new set of problems unique to the data science community. I believe containers will soon rule all data science work. 

Here is why:

1. Consistent environments and coding interfaces for the whole team

 
Imagine being able to distribute an “Amazon Machine Image”-like environment to all of your data science team’s machines easily. That is, no more inconsistency of versions, pip installs, firewall issues. Containers make this possible.

2. Ability to lift and shift data science work: Sharing and collaboration

 
Containers hold environment information and references to data. This means that entire projects, complete with runnable Jupyter notebooks can be passed to anyone on the data science team and from machine to machine. 

Image

3. Containers make data science projects Hardware and GPU agnostic

 
Nearly all companies provide Virtual Machines to their teams of data scientists to accomplish sandbox or production data science jobs. Over time, there is a proliferation of machines in an organization with projects that need to be migrated. Without a strategy for migrating projects, data science jobs break or there is an explosion of nearly worthless VM’s.

And GPU’s can be shared like never before.

4. Kubernetes needs Containerized Applications

 
Kubernetes is all the rage. At the core of this orchestration system are containerized applications. Kubernetes deploys and manages the underlying containers, however, the project must be containerized first. 

(My contacts in industry are already telling me that IT is starting to require containerized applications.)  

5. Cloud Agnostic and Zero cloudlock

 
GCP’s DataProc, AWS’s Sagemaker, or Azure Machine Learning comes with cloudlock (and potentially a huge price tag). When you develop using cloud services you are stuck with that cloud provider for that project until you retire the project or purposefully migrate away from it. 

Proper use of containers insulate data science projects from the risk of cloudlock. 

 
Would you like to know more about how containers are changing data science? Read more about how Gigantum handles containerized data science (here) or download the MIT-licensed client for authoring data science projects in R and Python and start using containers today (here).

Author: Shantun Parmar

9 thoughts on “5 Reasons Why Containers Will Rule Data Science

  1. Needed to send you this very little word to finally thank you so much the moment again relating to the striking thoughts you have shared on this site. This has been simply tremendously generous of people like you to allow extensively all that a lot of folks could have distributed as an e-book in order to make some bucks for their own end, chiefly considering the fact that you could have done it if you considered necessary. The good ideas additionally acted as the easy way to be certain that most people have a similar dream just like mine to find out a lot more in regard to this condition. I’m certain there are lots of more enjoyable moments ahead for individuals that examine your blog.

  2. I enjoy you because of all your efforts on this site. Ellie enjoys doing internet research and it’s easy to see why. All of us hear all concerning the compelling means you deliver good information by means of the blog and as well cause contribution from other people about this concept plus my daughter is actually being taught a lot. Enjoy the remaining portion of the new year. You’re the one carrying out a glorious job.

  3. Thank you for every one of your labor on this web site. My daughter really loves making time for investigations and it’s really obvious why. Most people learn all regarding the lively mode you present reliable techniques through this web site and even inspire participation from some others on the point then our favorite princess is really starting to learn a whole lot. Take advantage of the rest of the year. You are always doing a splendid job.

  4. I am just writing to make you understand what a really good experience my wife’s princess obtained using your webblog. She noticed some things, which included what it is like to possess an excellent teaching spirit to make others with no trouble learn about specific tortuous things. You undoubtedly surpassed readers’ expected results. Thanks for providing those productive, trusted, informative as well as unique tips on your topic to Mary.

Leave a Reply

Your email address will not be published.