Began as the storage system for the web search index at Google. Made available as a public service via Google Cloud Platform in 2015 and currently powers some of the most prominent web and mobile applications such as Spotify, Google Maps and Google Search. It’s no other than Cloud Bigtable, which is neither suitable for complex joins, ad-hoc interactive queries nor ideal for supporting an online transaction processing (OLTP) system.
So what is it and why might we need a massive database like Cloud Bigtable one day?
You’ll learn:
Asking any developer of an enterprise application, and you’ll know how much they are feeling frustrated with the limitations of relational databases. Too much to the extent that in 2009, a meetup was held in San Francisco to discuss with his peers about open-source, distributed, non-relational databases.
Originally chosen to make a good Twitter hashtag for that meetup, NoSQL caught on like wildfire, yet resulted in no generally accepted definition till now. Fast forward to 2021, businesses have an abundance of choices when it comes to data storage solutions, both relational and non-relational ones.
Yet one thing remains the same…
In the emerging world of 2021, big data and machine learning are still the holy grail while the need for superior speed and agility continues to accelerate cloud adoption. Does the old Hadoop technology even have a place in this new world? To shed some light on that question, let’s reflect on what Hadoop is and how the cloud is impacting Hadoop.
In this article, you’ll:
Once upon a time, as our business grew, we started to generate too much data…
Knowing how to integrate data from various sources and perform simple transformation to address data quality issues is the first step towards extracting insights from big data. In this blog post, we will explore how to build and deploy simple ETL data pipelines without coding via Cloud Data Fusion on Google Cloud Platform (GCP).
You’ll learn:
Despite the rising popularity of sophisticated data analytics and machine learning techniques, something has never changed. The first hurdle towards great insights is usually data integration…
Computer vision is one of the most extraordinary gifts coming out of the artificial intelligence world. With computer vision, many companies have attempted to see the world through computers’ eyes and made great strides in solving complex business problems such as identifying product defects in real-time, verifying customers’ identification or automating insurance claims process. Overlooking such real-life applications of computer vision could represent missed opportunities to unlock growth, productivity and cost-savings for businesses. So what is computer vision and how can it help?
You’ll learn:
Born in the 1980s, about 40 years old and still counting. It’s no other than the data warehouse, which requires some hefty investment, might take years to build. Yet the chance of failure is sky-high. Fast forward to 2021, the data warehouse has been evolving with time and will continue to be the backbone for business insights across organisations all over the world. So what is it? Why do we need a data warehouse in the first place? As a data professional, what do you need to know about data warehouse at the bare minimum?
In this article, you’ll learn:
Metadata! I bet you might have heard this term before and may have asked yourself what it is and why it is important. Let’s explore this concept with Google BigQuery.
In this article, you’ll discover:
Many sources define metadata as “data about data”. But I personally find it too vague and difficult to understand. So here is my attempt to define metadata in layman’s terms.
A good understanding of arrays and structs could be extremely powerful when analyzing big data because we can query faster and more efficiently with pre-joined tables from object-based schemas such as JSON or Avro files. In this blog post, we will explore arrays, structs and how to make use of their full potential in Google BigQuery through lots of examples.
You’ll learn
Enough talking, let’s start!
I’m starting 2021 with one of the essential New Year’s resolutions: Practice more complex SQL queries. If you’re on the same boat, join me to explore 3 useful SQL features with Google BigQuery.
You’ll learn how to:
Depending on your background, these might seem like basic features that appeared in other relational databases or they may appear exotic. Either way, I have included detailed examples and approach on how I tackled each query. …
HyperLogLog (HLL) is an algorithm that estimates how many unique elements the dataset contains. Google BigQuery has leveraged this algorithm to approximately count unique elements for a very large dataset with 1 billion rows and above.
In this article, we’ll cover 2 points.
The HyperLogLog algorithm can estimate cardinalities well beyond 10⁹ with a relative accuracy (standard error) of 2% while only using 1.5kb of memory.
-HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm by Philippe Flajolet et al.
HLL is one of many cardinality estimation algorithms that are all…
Love data, problem-solving, and storytelling | Observe the world through technology-driven lens | Cherish order from chaos