ICT Insight with Institute of ICT Professionals: Career opportunities in the data space

0

By  Kaunda Ismail

Data Engineering, Data Analytics, and Data Science form what I call the data journey or data pipeline.

I like the term pipeline because we can easily find a resemblance in our daily lives in a non-tech way. Ghana Water Company Limited (GWCL) has water pipelines, doesn’t it? The GWCL analogy will be used to explain these key job roles. 

How do we get water in our homes?

The water supply system involves several steps/stages to ensure that clean water reaches our homes and communities. The stages are quite many, but for our purposes, we’ll focus on the main stages:

Stage 1: Water Sources

Before Ghana Water Company Limited (GWCL) can supply us with water, they themselves must have the water. They can’t supply water to us if there is no water. So where do they get the water to supply it to our homes, companies, and communities?

The answer is obvious! They get it from rivers, lakes, groundwater, rainfall, etc. Some of the lakes are natural, and some are man-made. These man-made lakes include the Lake Volta and the Weija Reservoir/Dam. These were manually built to collect water from different sources (rivers, rainfall, etc.) into one common place.

Once the water is collected in a reservoir, what happens next?

Stage 2: Water Treatment

Treatment includes filtration, disinfection, and removal of impurities. This is to make sure clean water gets to our homes that we can use for our activities (drinking, cooking, cleaning, etc.).

Stage 3: Distribution Network

After treatment, water is pumped into a distribution network. This network consists of pipelines, pumping stations, and storage tanks. GWCL ensures water flows through these systems to reach homes and businesses.

Stage 4: House Connections

Household connections link individual homes to the water supply pipelines, and then we finally have the water in our homes for all purposes.

These four stages cover our simple water supply: from building the lakes to getting the water to our homes.

 

Understanding Key Data Roles

Data Engineers

Building lakes or reservoirs, building treatment facilities, building pipelines, treating (filtration, disinfection, removing impurities) the water, and getting the clean water to homes through the pipelines are the tasks of engineers. They can be civil engineers, hydraulic engineers, environmental engineers, etc.

Now, you know what? Replace “water” with “data,” and you’ve turned a Hydraulic Engineer into a Data Engineer. Replace “electricity” with “data,” and you’ve turned an Electrical Engineer into a Data Engineer. Replace “Dam” with “Data,” and you’ve turned Civil Engineer into Data Engineer.

Thus

  • The same way a GWCL engineer builds lakes or reservoirs to collect water, a Data Engineer builds lakes or reservoirs or warehouses to collect data.
  • The same way a GWCL engineer will build treatment centres to treat water, a Data Engineer builds treatment centres (staging) to treat data.
  • The same way a GWCL engineer builds pipelines so that water can flow from the storage to homes, a Data Engineer builds pipelines so that data can flow from the storage place to end users.

Data Engineers also use the words “Lake,” “Pipeline,” and “Storage.” The only keyword that changes is the word “Treatment.” While water is treated, data is transformed. But you are basically doing the same thing and want to achieve the same goal – to make it clean.
You make water clean by filtration, disinfection, and removing impurities. You make data clean by filtering, splitting, merging, and removing outliers or handling them in a special way.

In other words, a Data Engineer builds the data infrastructure (data lakes, storage facilities, data warehouses, databases, etc.) to collect data from diverse sources, then transforms the data and moves it through the data pipelines to the intended users or to a place where the intended users can have access to it..

When I say data lake, I mean it. That is the correct word for the infrastructure built to collect data on the Microsoft Azure platform. A data lake goes beyond a database or data warehouse. It is what the name says – a lake where every kind of data comes in, just like how every kind of water will get into Lake Volta or Weija Dam.

What skills do you need to become a Data Engineer?

There are quite a lot, but for starters, learn MS Excel, Python, SQL, and Azure Data Factory. You will qualify for an entry role.

Data Analyst

While the Data Engineer is responsible for building the lakes, pipelines, and treatment facilities, the Data Analyst ensures that the water (data) serves its intended purpose.

The Data Analyst is responsible for measuring, analysing, and reporting how the water is being used or distributed.

Below is a typical example of how an analyst will fit in our water scenario.

Step 1: Gathering Insights from Water Usage

Once water reaches homes and businesses, people use it for different activities – drinking, cooking, cleaning, irrigation, etc. A Data Analyst would look at this water usage data to answer questions like:

  • How much water is being consumed daily in each region?
  • Which areas use more water and why?
  • Are there unusual patterns in water consumption that require attention?
  • What time, month, season is water used most and why?

Stage 2: Monitoring Water Quality and Demand

If GWCL wanted to ensure consistent water supply and quality, they would need regular checks. For example:

  • Is the water reaching its destination as expected?
  • Are certain areas experiencing shortages?
  • How does water usage vary during the rainy season compared to the dry season?
  • A Data Analyst does something similar by monitoring data quality and analysing trends to provide actionable insights that improve decision-making.

Stage 3: Reporting to Stakeholders

GWCL must report their findings to stakeholders such as government bodies or community leaders. They may need to show charts, graphs, and tables that communicate:

  • Monthly water usage statistics
  • Areas of high-water wastage
  • Recommendations for infrastructure improvement

Stage 4: Supporting Operational Efficiency

Finally, based on their analysis, a GWCL team could recommend ways to improve water distribution and reduce wastage. For example:

  • Installing water meters in high-usage areas
  • Adjusting pipeline flow to reduce overuse or shortages

Likewise, a Data Analyst might suggest business strategies or operational adjustments based on their findings, such as:

  • Identifying cost-saving opportunities
  • Optimising processes to improve productivity

What skills do you need to become a data analyst?

Data analysis is the easiest route to entering the data space. Learn MS Excel, SQL, and Power BI and you will qualify for entry role.

Data Scientist

A Data Scientist goes a step further than a Data Analyst by not just interpreting the existing data but also using advanced tools and techniques to predict future trends, build models, and solve complex problems.

If we stick to the GWCL water system analogy, the Data Scientist’s role would involve applying science and predictive modelling to help GWCL make informed decisions about the future.

Scenario 1: Predicting Water Demand

Imagine GWCL wants to ensure a stable water supply during the dry season. They could ask the Data Scientist:

Based on past consumption patterns and current population growth, how much water will we need to supply next year during the dry season?”

The Data Scientist would:

  1. Use historical data on water usage, weather patterns, and population growth
  2. Build predictive models (e.g., using Machine Learning algorithms) to forecast future demand
  3. Provide recommendations for infrastructure adjustments or additional water reservoirs

Scenario 2: Optimising Water Distribution

GWCL may face challenges like water shortages in certain areas. A Data Scientist could step in to:

  1. Identify inefficiencies in the water distribution system using data from sensors on the pipelines.
  2. Create an optimisation model to ensure water is distributed equitably, minimising wastage while meeting demand.

Scenario 3: Detecting Anomalies (Leakage Detection)

Suppose there’s a sudden drop in water pressure in certain pipelines, potentially indicating a leak. A Data Scientist could:

  1. Analyse real-time data from pressure sensors and flow meters across the pipeline network
  2. Use anomaly detection algorithms to pinpoint where the problem is likely to be
  3. Send alerts to the maintenance team to fix the leak before it worsens

Scenario 4: Recommending Strategies for Long-Term Sustainability

GWCL might aim to reduce their reliance on natural water sources by investing in rainwater harvesting or desalination plants. The Data Scientist could:

  1. Simulate different scenarios using existing data (e.g., rainfall patterns, costs of desalination, population growth).
  2. Build a decision-making model to identify the most cost-effective and sustainable solution.
  3. Present a roadmap to implement the selected strategy.

What skills do you need to become a data scientist?

Learn Python, machine learning, statistical analysis, and have a very good domain knowledge of the area.

Data Science is amazing and data in general is lovely.

Kaunda is a  | Member, IIPGH

For comments, contact [email protected] | Facebook.com/KaundaAi