Become a data scientist without a PhD: Part 2

Remember that $1.2 trillion business value of AI? Just released, a new set of data science labs zoom in on Google Cloud’s data science and ML tools.  Enroll in the Data Science on GCP: Machine Learning quest by Monday, November 15th and you’ll get 1 month pass to Qwiklabs (free of charge, no CC required). You’ll run machine learning jobs with state-of-the-art tools and real-world data sets.

There are over 8,500 Data Scientist jobs on LinkedIn. 68% of those jobs require machine learning expertise. As a data scientist, you will transform data to:

  • Improve revenue, business agility, customer experience
  • Reduce costs
  • Development of new products and or product features

Don’t have these skills, or want to improve? You don’t need a PhD. Learn how to do all of these things and more with a lab. Here are some of the labs:

  1. Machine Learning with Spark on Google Cloud Dataproc lab: Analyze data using Spark with the PySpark interactive shell on the master node of the Cloud Dataproc cluster running on Google Cloud Datalab:

Then create and train a Spark Dataframe by importing, developing, saving and restoring a logistic regression model. You will then build data visualizations with Jupyter notebooks. In your model, does the on-time arrival probability rise with overall flight distance? Share your results @Qwiklabs!

2. Processing Time Windowed Data with Apache Beam and Cloud Dataflow (Java): You’ll configure Maven Apache using the starter project archetype for Cloud Dataflow projects:

Patience! This will take a few minutes to compile… When the build is successful, you should see something like this:

Finally! You’re ready to deploy a Java application to Apache Beam to create training and test data files.

If you are successful, you should see these files. Did you get it? Share your results! @Qwiklabs.

3. Bayes Classification with Cloud Datalab, Spark and Pig on Google Cloud Dataproc: Have you ever performed quantization of a data set? Here’s your chance. Use Dataproc, Datalab and Spark to perform quantization of a dataset to improve the accuracy of a data model. Then visualize your data with Jupyter notebooks and Apache Pig:

And don’t forget about part 1, the Data Science on GCP quest. Both quests cover the hands-on exercises described in Data Science on Google Cloud Platform book by Valliappa Lakshmanan (Lak).