Introduction
This guide helps users understand how InfoTiles ingests data, processes it, and presents insights through maps, visualisations, and dashboards. The goal is to help you quickly find relevant datasets, interpret analyses, and use filtering and search in a safe and efficient way.
The guide is practically oriented, focusing on day-to-day work with data, while also explaining core principles and assumptions that affect results.
How InfoTiles handles data
InfoTiles supports three main types of input data. The only requirement is that data is structured for machine readability — i.e. table format or database format.
- Data fetched from source systems and processed automatically by machine learning models.
- Datasets you upload directly into the solution as static tables (CSV, (geo)JSON, shapefile, or similar).
- Datasets fetched directly from an external source without processing — for example, publicly available weather data or a direct mirror of a specialist system.
Together these form a flexible data pipeline that can accommodate both continuous streams and periodic, manual updates.
Getting started
New to InfoTiles? See the Explore ready-to-use dashboards article and the Interact with a Dashboard guide for step-by-step instructions on opening and working with dashboards.
Using the Dashboard
Basic navigation and use
Dashboards are fully interactive: selecting, clicking, or marking an element (map, chart, or table) automatically filters all other visualisations.
- At the top of the dashboard you will find search, filter, and time selector controls, used to narrow which data is shown and which period the analysis covers.
- In the search field you can use free text to search directly on name, ID, or properties to find specific objects such as pumps, pipes, or zones.
- The map can be navigated with zoom and pan, and map layers can be toggled on and off to reduce complexity and focus on relevant objects.
- Clicking elements in charts sets the selected property as a filter, and all visualisations update immediately.
- You can filter by position or focus on objects within a geographic area by selecting / marking an area on the map.
- Active filters are shown at the top and can be removed by clicking "×" to return to the full overview.
- Individual visualisations can be maximised for a better overview — click the three-dot menu and choose Maximise.
- Tables can be used for sorting and downloading for further analysis. To export to CSV, click (…) in the top right and choose Download CSV. See also: How to export data as a CSV file.
Filter function and search function
Filters can be applied across maps, charts, and tables — for example by zone, ownership, material type, age, risk level, or time period. All filters work dynamically and update visualisations immediately.
If two datasets share columns/fields with the same name, you can filter both datasets simultaneously using the search function. When using the filter function (green square), first select Data View and then the query. The search field (highlighted in red) will search through everything — so if the value "PS100" exists in both sources it will appear in both results, even if the value is stored in fields with different names.
For example, you can display only pipes (from the pipes dataset) and breaks (from the work-order log) on pipes with a specific material type — even though the break records and pipe records are in different datasets.
Note: If a dashboard contains charts and visualisations based on different datasets, a filter applied to one dataset will cause visualisations based on other datasets to appear empty — the no results found icon will be shown.
Input Data, Data Quality, Analysis and Machine Learning
Processed input data — PipeFusion
All datasets starting with pf_ have been ingested via a machine learning tool called PipeFusion. PipeFusion uses data from Gemini VA as its source and will by default apply a filter for status: Drift (in service). Pipes from Gemini with other status values — removed, replaced, or projected — will not appear in PipeFusion results.
As part of quality assurance, missing nodes, missing connections, and incorrect pointers from Gemini are corrected. The original value from LSID/PSID is retained in the field pf_id_source, and changes are documented under pf_history. Total node count and pipe length may be updated as a result of corrections.
For analysis of raw data, dedicated data views exist for the relevant tables, prefixed with Gemini:
Datasets uploaded directly by the user — client datasets
Datasets uploaded as "dead" (static) tables should start with client_ to distinguish them from the rest, and to avoid conflicts between read and write permissions in the solution.
This is useful if you have data in e.g. Excel that you want to combine with other analyses. Be aware that datasets starting with client_ are not updated unless you or someone with access to the source file manually uploads a new version. You should therefore not use this method for data that needs to be updated more than once a year.
Tabular data (such as CSV) has its own shortcut on the landing page (reached by clicking the InfoTiles logo in the top left). See also: How to import data from a file.
It is also possible to load your own spatial data while working in the map, using Add layer:
Followed by Upload file. Shapefile format is recommended as it automatically reprojects coordinates without requiring manual configuration.
Naming conventions for uploaded datasets
Good practice when uploading your own dataset is to include the date at the end of the filename — either just the year or the full date. Date formats should follow one of these:
-
yyyymmdd— standard in the data world, easy to sort newest to oldest. dd-mm-yyyy
Also give the dataset a meaningful name so colleagues understand what it contains. Avoid spaces and full stops in the name to prevent issues with system settings and file format parsing.
Examples of good dataset names:
client_watersamples_2024client_basement_flooding_2018-2024client_sewagezones_20230416client_unidirectional_watersupply_16-04-2023
Data directly from source
Datasets loaded directly from the source, without processing in PipeFusion, are named from the source system. For example: Gemini_workorder_diary.
Analyses Performed by PipeFusion
PipeFusion performs several analyses on datasets from the pipe database and work-order systems, along with associated observations and operational data (e.g. maintenance leak reports, pipe inspection reports) and other relevant data sources such as subscriber databases, and various sensor data and measurements (e.g. SCADA data, weather data, water quality data).
Key analyses and data generation performed by PipeFusion:
- Network connections and zone divisions
- Probability of failure/breakage
- Consequence of failure/breakage
- Risk of failure/breakage (based on probability and consequence)
- Inflow/infiltration calculations
- Water consumption for calculation and alerting of water leaks with associated area delineation
Network connections and zone divisions
Generating missing connections
PipeFusion identifies missing connections by building a graph model of the pipe network based on nodes and pipe elements, then testing connectivity, direction, and logical relationships in the network. The algorithm detects breaks in the structure — pipes that are close to each other without being connected, incorrect endpoints, or missing nodes. Based on spatial proximity and network logic, PipeFusion proposes automatic corrections that connect elements correctly. These corrections can be reviewed and validated before being used further in analyses.
Generating missing pipes
PipeFusion can generate new pipes by:
- Adding short pipes where they are missing.
- Splitting existing pipes into multiple components at nodes, to maintain a hydraulically correct model.
PipeFusion generates missing pipes by analysing the network structure and identifying logical breaks between nodes that should be connected but lack a registered pipe.
Service connections: The solution can also generate service connections from a municipal pipe to an uploaded point representing a subscriber (location of water meter, registered subscriber, building, or other relevant geopoint). This is done by PipeFusion calculating an assumed route (which is logical for a machine; topography, physical obstacles, and practical conditions are not taken into account). Total pipe length in InfoTiles will therefore be higher.
Missing connections: Below is an example where PipeFusion flags a possible missing connection or pipe between two points. Two points are shown as candidates for correction — where the actual fix must be made in the source system (Gemini VA).
Pipes not connected to the network (unconnected)
PipeFusion will "traverse" all valid paths through the network using a method called traversal. Pipes connected to each other in groups of at least 20 pipes are assigned a value under Group. However, some pipes are too far from anything to connect to, meaning from a data perspective it is not possible to carry water to or collect water from these pipes. This layer is designed to simplify searching for such pipes, as methods in Gemini such as parallel-scrolling or visual map analysis can make them difficult to detect.
Zone divisions for water (DMAs)
In Norway, we typically distinguish between two types of zone division for drinking water networks: metering zones and pressure zones. In English, zones in the drinking water network are often called DMAs (District Metering Areas — i.e. metering zones), but this can also mean District Managing Areas, which is a broader concept. PipeFusion calculates 2 types of zones for water:
- Pressure zones: Zone type delimited by water works, reservoirs, booster stations, closed valves, and pressure reduction valves (PRVs). Check valves are classified as PRVs as they perform the same logic.
- Consumption zones (DMA): Use the same main objects as above, but replace PRVs with bulk meters. Note: the method for creating metering zones always starts from a "measurement point" with entity type "MM" in the table VA_LEQUIP in Gemini. If an area is expected to be a metering zone but shows no value, check which objects define the boundary — often a closed valve is missing or misclassified.
Zone divisions for sewage
Generating sewage zones
Input data: pipes of types combined sewer (AF), wastewater (SP), and stormwater (OV) with subcategories, plus stations (PAF, PSP, POV), treatment plants (RA), overflows (OVL), and outfalls (UTL/UTS).
Different municipalities use different practices for sewage zone definitions. PipeFusion generates zones based solely on network information (points and pipes) and does not account for or override municipality-specific zone practices.
The algorithm starts at treatment plants and travels backwards through the network until it stops at a node that is the starting point for a zone. After all pipes and points are analysed, an additional step traverses the network to map cross-connections and other special cases.
For sewage, the network is divided into zones by splitting it into connected sewage zones sorted by type of endpoint: pump station/pump, overflow, treatment plant, outfall, or "Ambiguous" (areas that can drain multiple ways due to cross-connections).
You can also filter on different network types in the map by toggling layers in the map menu: Sewage/wastewater, Stormwater, Combined, Other.
Outfall
Outfall zones are generated from outfall points, where water exits the system to the recipient and connects to entity type UTS/UTL in Gemini.
Zone naming
Zone naming is based on the station name (pf_station). When Gemini VA is the source for the pipe database, this field is linked to the REF/EXREF/STATION fields where the structured name (e.g. PS100) is stored.
Visual zone delimitation vs. zone as a pipe property
If you filter by geometry on the map, you will get all pipes visible within the zone, but not all of these are necessarily connected to the network in that zone. If you filter by zone name under the pipes' properties, you will only see the pipes that PipeFusion has connected to that zone.
Risk calculations
This section describes how risk in the pipe network is calculated through a combination of probability of failure and consequence of failure. This provides a more complete and decision-supporting risk picture that can be used for targeted prioritisation of measures.
Probability of failure
Probability of failure is calculated using machine learning trained on historical failure and maintenance data from many Norwegian municipalities. The model uses, among other things, age, material, and other properties of the pipe network, combined with registered incidents and work orders. The algorithm also considers the network structure — how pipes and nodes are connected in the system — using a Graph Neural Network (GNN) that analyses connections and patterns in the network.
The result is a probability value indicating how likely it is that a component has already failed or will fail soon. Input data includes:
- Pipe database: pipe function (water, sewage, combined, stormwater), material type, year of installation, geographic position in the network.
- Work-order/operational data: failure history for pipes of comparable types, blockage history, pipe inspection data.
PipeFusion's calculated probability of failure is accurate for around 90% of pipes. However, if you don't know which pipes the model is unexpectedly wrong about, users should always be sceptical of the result. This can be addressed by displaying a reference calculation based on empirical statistics in parallel — for example Norsk Vann's prioritisation table for pipe replacement — to easily see where there are large deviations between PipeFusion and expected results.
Consequence calculations
Consequence is calculated by analysing what happens in the network if a given element fails or stops functioning. The model is based on mathematical graph theory and requires the network to be stored as a graph database — it is therefore calculated on the result from PipeFusion, not raw data. The core principle is to calculate what share of the network loses connectivity if a pipe is removed.
Sewage: The sewage network is defined with flow direction, so the method starts at the outermost part of the network, an end node, and follows the flow direction while calculating backwards how many elements depend on the given pipe. The result is normalised based on position and hierarchy and given a value between 0 and 1. Categories 1–5 are also produced so results can be compared with other approaches or set up as a risk matrix following the DiVa method.
Water: Water supply is more complex as there is no fixed flow direction and pipes can belong to ring systems. First, all paths water can travel from the water works to an end node are calculated, and pipe connections belonging to many travel paths get higher consequence than those in only a few possible routes. Redundancy is also taken into account, so alternative supply routes can reduce consequence. Pipe dimension is used as a factor in consequence calculations within each ring system, so larger dimensions get higher consequence and smaller pipes are graded down based on their size relative to the largest pipe in their group.
Inflow and infiltration analyses
Inflow/infiltration (fremmedvann) is calculated per pump station by analysing incoming volumes of sewage based on available measurement data — flow measurements, level measurements in sumps, or pump run-times combined with known pump capacity. The model first establishes a reference level based on dry periods, where water added is mainly assumed to be sanitary sewage. Deviations above this level are interpreted as inflow/infiltration — typically from infiltration, leakage, or faulty connections.
- Gross inflow: Total volume of inflow handled by a pump, without accounting for what has already been added from upstream areas.
- Net inflow: Inflow volume originating within the pump's own catchment area — inflow from upstream pump stations is deducted based on the hierarchical structure of the pipe network. Net inflow gives a better picture of where in the network the inflow actually originates and where measures should be prioritised.
InfoTiles SewerIntelligence calculates the following approximately in real time per pump station:
- Total volume of water pumped at each pump station
- Expected sewage volume (baseflow) at each pump station
- Volume and share of inflow in near real time, per hour and per pump station, including:
- Total inflow volume transported
- Net inflow volume originating within the pump's own catchment area, after upstream contributions are subtracted
Data sources: The SewerIntelligence module calculates inflow volumes using historical and real-time SCADA data (flow and level measurements) together with rainfall data. Measurement inputs used include: flow meters in pump stations, level sensors in pump sumps and at overflows, level sensors in flumes in the network, and flow meters at treatment plant inlets. The method is designed to work with only level measurements, as these are the minimum variable always available. If flow meters are available — in pumps or in the network — these can also be used, but are not required.
Expected baseflow per time unit per weekday (V_n) is determined using a combination of manual labelling and statistical analysis of raw data. To avoid being skewed by abnormal years (dry or wet), at least 2 years of data is preferred so that each season is represented. In the absence of long training data, the driest available scenario will be used as the baseline for inflow calculation.
For a detailed explanation of SewerIntelligence outputs and how to interpret them, see: SewerIntelligence — Understanding the Output.
Identifying likely sources of inflow
Starting from a pump station where inflow is to be analysed, you can navigate directly to all upstream components connected to the selected pump. The analysis shows, among other things, distribution of ownership, estimated rehabilitation costs for selected elements, risk calculation per pipe component (based on component properties), age, material types, and other relevant parameters.
Components with the highest probability of failure are marked in red on the map and summarised in an associated table. In the work of reducing inflow, these calculations can be used as a basis for:
- Planning and prioritising inspections, with a focus on high-risk components
- Identifying new measurement points to detect inflow (e.g. at network branches to confirm or rule out inflow in delimited areas)
- Planning rehabilitation with assessment of cost and benefit
Comments
0 comments
Please sign in to leave a comment.