The very pursuit of accuracy could influence you to pick a terrible model

Photo by Craig Adderley from Pexels

This is Newt! He has a binary classification problem


It’s surprisingly quick and simple to predict the future with the all-too-familiar SQL syntax.

Photo by Anna Atkins on Unsplash

Expect rain. Only two simple words, yet the stakes sometimes could be much higher than grabbing an umbrella before leaving the house tomorrow. Rain could ruin picnic plans or bring tremendous joy to farmers who are desperate to save their drought-stricken crops.

Learning how to predict next-day rain is a simple and practical way to explore Machine Learning with Google BigQuery. So let’s find out how we can make it happen.

In this article, you’ll discover:

  1. Why BigQuery ML for Machine Learning?
  2. How to ingest and split the dataset into a training set and a test set?
  3. How to train…


The almighty NoSQL database to power mobile sites with real-time analytics for better customer experience

Photo by Thom Holmes on Unsplash

Began as the storage system for the web search index at Google. Made available as a public service via Google Cloud Platform in 2015 and currently powers some of the most prominent web and mobile applications such as Spotify, Google Maps and Google Search. It’s no other than Cloud Bigtable, which is neither suitable for complex joins, ad-hoc interactive queries nor ideal for supporting an online transaction processing (OLTP) system.

So what is it and why might we need a massive database like Cloud Bigtable one day?

You’ll learn:

  1. The real-world challenge: Site speed and personalisation customer engagement
  2. A primer…


How to avoid those dreaded pitfalls and “gotcha” moments when selecting databases for your next application?

Photo by Nik Shuliahin on Unsplash

Asking any developer of an enterprise application, and you’ll know how much they are feeling frustrated with the limitations of relational databases. Too much to the extent that in 2009, a meetup was held in San Francisco to discuss with his peers about open-source, distributed, non-relational databases.

Originally chosen to make a good Twitter hashtag for that meetup, NoSQL caught on like wildfire, yet resulted in no generally accepted definition till now. Fast forward to 2021, businesses have an abundance of choices when it comes to data storage solutions, both relational and non-relational ones.

Yet one thing remains the same…


In a post-COVID-19 future, businesses must be prepared to respond and adapt rapidly. Does the old Hadoop technology even have a place in this new world?

Photo by Wai Hsuen (Weixian) Chan on Unsplash

In the emerging world of 2021, big data and machine learning are still the holy grail while the need for superior speed and agility continues to accelerate cloud adoption. Does the old Hadoop technology even have a place in this new world? To shed some light on that question, let’s reflect on what Hadoop is and how the cloud is impacting Hadoop.

In this article, you’ll:

  1. Get to know Hadoop
  2. Explore Hadoop on the cloud
  3. Look to the future and beyond with modern cloud platforms

Get to know Hadoop

Why was Hadoop a big deal?

Once upon a time, as our business grew, we started to generate too much data…


When all you want is simple steps to get good data scattered everywhere into a single location for quick business insights, don’t let the complexities of coding and infrastructure hold you back.

Image by Author

Knowing how to integrate data from various sources and perform simple transformation to address data quality issues is the first step towards extracting insights from big data. In this blog post, we will explore how to build and deploy simple ETL data pipelines without coding via Cloud Data Fusion on Google Cloud Platform (GCP).

You’ll learn:

  1. Why do we need Cloud Data Fusion?
  2. How to build and deploy ETL pipelines with Cloud Data Fusion?

Why do we need Cloud Data Fusion?

The problem: Your first hurdle towards great insights from big data

Despite the rising popularity of sophisticated data analytics and machine learning techniques, something has never changed. The first hurdle towards great insights is usually data integration…


Computer vision has the potential to improve the consumer experience, reduce costs and enhance security. Understanding the fundamentals of computer vision is the first step towards unlocking a crucial competitive edge. Are you ready?

Image by Author

Computer vision is one of the most extraordinary gifts coming out of the artificial intelligence world. With computer vision, many companies have attempted to see the world through computers’ eyes and made great strides in solving complex business problems such as identifying product defects in real-time, verifying customers’ identification or automating insurance claims process. Overlooking such real-life applications of computer vision could represent missed opportunities to unlock growth, productivity and cost-savings for businesses. So what is computer vision and how can it help?

You’ll learn:

  • What is computer vision?
  • How is computer vision applied in today’s world?
  • How do we…


Making Sense of Big Data

How do you obtain business insights when all you have on hand is raw unprocessed data scattered everywhere?

Image by Author

Born in the 1980s, about 40 years old and still counting. It’s no other than the data warehouse, which requires some hefty investment, might take years to build. Yet the chance of failure is sky-high. Fast forward to 2021, the data warehouse has been evolving with time and will continue to be the backbone for business insights across organisations all over the world. So what is it? Why do we need a data warehouse in the first place? As a data professional, what do you need to know about data warehouse at the bare minimum?

In this article, you’ll learn:


Effortless approach to determine what is in the BigQuery dataset and which tables are useful for analysis with INFORMATION_SCHEMA and TABLES

Photo by author (Created using Canva.com)

Metadata! I bet you might have heard this term before and may have asked yourself what it is and why it is important. Let’s explore this concept with Google BigQuery.

In this article, you’ll discover:

  • What is metadata?
  • Why query table metadata in Google BigQuery?
  • How to query table metadata with INFORMATION_SCHEMA and TABLES?

What is metadata?

Many sources define metadata as “data about data”. But I personally find it too vague and difficult to understand. So here is my attempt to define metadata in layman’s terms.


Stepping up your SQL query game with these 2 advanced SQL concepts is easier than you think. Don’t miss out!

Photo by toine G on Unsplash

A good understanding of arrays and structs could be extremely powerful when analyzing big data because we can query faster and more efficiently with pre-joined tables from object-based schemas such as JSON or Avro files. In this blog post, we will explore arrays, structs and how to make use of their full potential in Google BigQuery through lots of examples.

You’ll learn

  1. Why do we need to know arrays and structs?
  2. Arrays and how to work with them
  3. Structs and how to combine them with arrays to create nested records

Enough talking, let’s start!

Why do we need to know arrays and structs?

Skye Tran

Love data, problem-solving, and storytelling | Observe the world through technology-driven lens | Cherish order from chaos

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store