SolarMap.PH

methodology · v1 · 2026-05

From four satellite channels to one map color.

Updated each quarter when caveats change. The pipeline is a single Python script (pipeline/pipeline.py) reproducible by anyone with Earth Engine access.

flowchart LR
  S2NIR["Sentinel-2 B8
NIR delta"] S2SWIR["Sentinel-2 B11
SWIR delta"] LST["Landsat 8/9 ST_B10
LST anomaly"] VIIRS["VIIRS DNB
nightlight delta"] MASK["ESA WorldCover
built-up mask"] Z["z-score per signal
across 61 cities"] W["weighted sum
-0.40 * NIR, -0.30 * SWIR
-0.20 * LST, +0.10 * NL"] MAP["choropleth fill"] S2NIR --> Z S2SWIR --> Z LST --> Z VIIRS --> Z MASK -.->|restricts pixels| Z Z --> W W --> MAP
fig 01 · 4 raw signals, z-scored across 61 cities, weighted, mapped. Negative weights on NIR / SWIR / LST encode that darker or cooler imagery indicates more solar; the composite always reads higher = more solar-like.

What SolarMap.PH measures

A composite of four satellite-derived signals computed at city/municipality scale across Meralco's franchise area, each quarter:

How signals become a composite

Each city's raw signal is z-scored across all cities in the franchise. Composite = -0.40 × z_nir + -0.30 × z_swir + -0.20 × z_lst + 0.10 × z_nightlight. The negative weights on NIR/SWIR/LST reflect that "darker" or "cooler" indicates more solar. Positive z-scores on nightlight are a load-shape proxy. Initial weights are v1 priors based on PV remote-sensing literature; if/when ground-truth labels are obtained, weights become tunable.

Cloud and noise handling

Aggregation unit

City/municipality level is the headline scale (~50 polygons). Barangay drilldown is computed only where built-up area exceeds 1 km2; below that, signal is too noisy for a useful claim.

What SolarMap.PH does not claim

Data joiners

Reproducibility

The pipeline is one Python script (pipeline/pipeline.py) plus a validator. Earth Engine auth via service account. Anyone with EE access and the repo can run it and produce the same outputs. The full algorithm, weights, and data files are MIT/CC-BY licensed.

Ground-truth validation, and why the pin layer was dropped (2026Q2)

We hand-checked all hot-spot pins from the 2026Q2 release against Esri World Imagery (~1-3 yr vintage in PH) and ran a follow-up extraction with multi-channel filtering. The pin layer was retired before launch when both passes returned precision below the threshold for an honest claim.

v1.0: NIR-only local-anomaly extraction

We hand-checked all 181 hot-spot pins from the 2026Q2 release against Esri World Imagery (~1-3 year vintage in PH) by inspecting a 240m-wide tile centered on each hot-spot's lat/lon. Labels:

Label Count % of reviewable
solar63.6%
unclear169.5%
not_solar14787.0%
missing imagery12excluded
total181169 reviewable
tab 01 · ground-truth labels for 2026Q2 hot-spots, single reviewer, 2026-05-08. Full per-hotspot labels at docs/groundtruth/labels.json.

Strict precision (treat unclear as not-solar): 6/169 = 3.6%.
Lenient precision (treat unclear as solar): 22/169 = 13.0%.

Both numbers are low. Even at the most generous interpretation, fewer than 1 in 5 hot-spots show visible solar panels in the reviewable hi-res imagery.

What that actually means

v1.1: multi-channel AND extraction

We tried tightening: require BOTH NIR drop AND SWIR drop (the same fingerprint used in the diff visualization), top-1 per city, larger minimum cluster size. v1.1 produced 61 hot-spots from 60 cities. After re-validation against fresh Esri tiles, strict precision was 1/57 = 1.75%, worse than v1.0. The multi-channel filter selected different clusters but they weren't more solar-y. Asphalt resurfacing, fresh concrete, and roof material changes all darken in BOTH NIR and SWIR. The AND was less specific than we hoped.

Conclusion: pixel-level NIR/SWIR differencing at Sentinel-2's 10m resolution isn't specific enough to claim per-location solar. The signal exists at city-aggregate scale (multi-signal composites average out per-pixel noise), but individual pixels and small clusters are too noisy to publish as identification claims.

What we kept and what we dropped

How to use SolarMap.PH honestly

CNN-based per-tile detection (v3)

The Sentinel-2 pipeline above gives a useful city-level signal but is structurally too coarse to claim solar at a specific roof (one S2 pixel = 100 m²; a typical residential 5 kWp install fits inside a single pixel). To make per-roof claims defensible we built a separate detection layer on hi-res Esri imagery and a CNN classifier.

The ML stack, end to end

Three layers, in order: a frozen CLIP-ViT-L/14 encoder turns each 600 × 600 px Esri tile into a 768-dim embedding; a logistic-regression head turns embeddings into a tile-level solar probability; a SAM-based localizer (covered below) turns each high-confidence tile into a per-building panel polygon. Training is offline, ~30 min on an M-series Mac. Inference over all of NCR is ~13 min (cold) or ~3 min (when re-classifying a cached tile set).

flowchart TB
  subgraph TRAIN[Training]
    direction TB
    OSM["OSM Overpass
312 solar locations"] GT["6 hand-verified
case studies"] NEG["46 GT not_solar
+ 154 random NCR"] FETCH["Esri tile fetch
240 m view, 0.4 m/px"] AUG["4x augmentation
rotation, flip, jitter"] EMB["CLIP-ViT-L/14 frozen
768-dim embedding"] LR["Logistic Regression
class-balanced, C=1.0"] CV["5-fold group-aware CV"] CLF["clf_v3.joblib"] OSM --> FETCH GT --> FETCH NEG --> FETCH FETCH --> AUG AUG --> EMB EMB --> LR LR --> CV CV --> CLF end subgraph SCAN[NCR scan] direction TB GRID["240 m grid over NCR
16,544 tiles"] SCANEMB["CLIP embedding"] SCORE["clf_v3 score per tile"] HIGH["high tier
score 0.85 and up"] CAND["candidate tier
score 0.70 to 0.85"] GEO["rooftop_solar_ncr.geojson
280 high + 235 candidate"] GRID --> SCANEMB SCANEMB --> SCORE SCORE --> HIGH SCORE --> CAND HIGH --> GEO CAND --> GEO end CLF -.->|loaded for inference| SCORE
fig 02 · ML training and inference. The logistic head sits on top of frozen CLIP features so the same encoder serves training and inference; only the small joblib-pickled head needs to be redeployed when labels change.

Pipeline

  1. Positive set bootstrap. We queried OpenStreetMap (Overpass) for nodes and ways tagged generator:source=solar AND location=roof within the Meralco franchise bounding box (14.0–15.2 N, 120.6–121.5 E). 312 verified rooftop-solar locations, deduped to 294 unique buildings at ~10 m clustering. We added the 6 hand-verified case studies on top.
  2. Tile fetch. For each location, fetch a 600×600 px Esri World Imagery tile centered on the OSM lat/lon, covering ~240 m × ~240 m at ~0.4 m/pixel.
  3. Negative set. Re-used the 46 hand-labeled not_solar tiles from the ground-truth pass, plus 154 random tiles drawn uniformly from the NCR bounding box. Total 200 negative sources.
  4. Encoder. CLIP-ViT-L/14 (openai/clip-vit-large-patch14), frozen. We tried two pretrained DeepSolar-flavored Hugging Face Mask2Former models first; both saturated the segmentation mask at 100% on every image and were unusable. CLIP zero-shot with a prompt ensemble gave F1 ≈ 0.5. Linear-on-CLIP-features beats both.
  5. Augmentation. 4× per source (rotation 0/90/180/270, optional horizontal flip, mild brightness ±15% and color ±10% jitter). 2,500 rows total: 1,500 positive, 1,000 negative.
  6. Classifier. Logistic regression with class-balanced weights on the 768-dim CLIP image embedding. Trained for 2,000 max iterations, regularization C=1.0.
  7. Validation. 5-fold group-aware cross-validation. Each source's augmentations are held out together (no within-source leakage). Per-source max-score aggregation.

Results (v2 baseline)

v3 cleanup pass: label-noise removed

v2 surfaced its own label noise: 4 random-NCR tiles that scored above 0.85 (rneg_0074, rneg_0086, rneg_0116, rneg_0136) were unmistakably real solar when checked by eye. v2 also kept the 2 noisy case studies as positives. We retrained clf_v3 with those 4 promoted to positives and the 2 noisy cases dropped: pure label cleanup, no architectural changes.

The "false positives" are real positives

The 4 highest-scored "negatives" in the validation set (random NCR tiles) all contain visible rooftop solar arrays when checked by eye. These were never in OSM but the classifier found them. Two examples:

The LOSO precision number quoted above (98% at threshold 0.85) is a lower bound. The "false positives" inflating the negative class are mostly OSM-coverage gaps. The CNN's value sits there: it surfaces solar the OSM community hasn't yet mapped.

v4: a second active-learning round + honest holdout calibration

The v3 CV precision is biased upward: data augmentation creates near-duplicate variants of the same source image, and group-aware folds can leak similarity through the augmentation chain. To get an honest precision number we ran another active-learning round and split out a never-trained 20% holdout for Platt-sigmoid calibration.

Encoder ablation: CLIP wins

Before locking the encoder, we ran the same training set through two alternatives at the same head and the same Platt holdout:

The aerial-pretrained encoder underperforming the general-purpose one was the most surprising result. CLIP's web-scale text-paired image pretraining apparently encodes "is this a solar panel array" better than satlas's narrower aerial-task pretraining. CLIP-ViT-L stays locked.

Cross-match against OpenStreetMap

For every CNN detection we look up the nearest OSM-tagged solar location and treat detections within 200 m as "confirmed by OSM" (model rediscovers known solar) and detections farther away as "new" (model proposes solar OSM has not tagged). On the production v4 round-3 NCR scan (16,544 tiles on a 240 m grid, 515 detections; a franchise extension to the Bulacan / Cavite / Rizal / Laguna industrial belts is queued for a future quarter):

This is a useful sanity check for the classifier. If almost every detection re-mapped an OSM point, the model would just be memorizing training data; if none did, the model would be hallucinating. The ~13% confirmed / ~87% new split at the high tier is consistent with a classifier that learned the visual signature of rooftop PV well enough to recognize it on buildings the OSM community has not yet mapped, in a region where OSM rooftop-solar tagging is sparse. "Within 200 m" is the same proximity convention used by DeepSolar (Stanford, 2018) and SPECTRUM (ICSC, 2025).

The "candidate" tier is noisier: many genuine borderline cases plus more visually-ambiguous false positives. The high-confidence tier is the production set.

Coverage scan

We tile NCR (14.20-14.85 N, 120.88-121.22 E) on a 240 m grid, no overlap: 16,544 tiles. A Phase 5 extension to the Bulacan / Cavite / Rizal / Laguna industrial belts (45,752 tiles total) is staged in detection/scan/luzon_scan.py and queued for a future quarter. Each tile is fetched from Esri, embedded with CLIP-ViT-L, and scored by the classifier. Results at score ≥ 0.70 are written to site/public/data/rooftop_solar_ncr.geojson with a tier property (high for ≥ 0.85, candidate otherwise). The "your roof" page surfaces detections within 1.5 km of the user's geocoded address.

The base detection layer outputs tile-level Points (the 240 m square center). v2.1 (next section) localizes panels at building-polygon resolution.

v2.1 Building-level panel localization

Tile-level detection answers "is there solar in this 240m grid square." It cannot answer "which roof," and it cannot estimate kWp installed without a footprint. v2.1 adds panel-polygon resolution by combining three signals.

flowchart TB
  TILE["High-confidence tile
from clf_v3 scan"] SAM["SAM ViT-B
auto-mask generator
~150 masks per tile"] COLOR["Color signature filter
brightness 25 to 140
blue bias -25 to +60"] CROP["192 px context crop
centered on each mask"] CEMB["CLIP-ViT-L embed"] LR2["clf_v3 score"] COMB["combined score
0.6 * CLIP + 0.4 * color"] KEEP["keep if combined >= 0.70"] OSM2["OSM buildings within 200 m
via Overpass"] MATCH["Centroid-in-polygon
or bbox overlap"] MERGE["Per-building merge
sum panel area
cap at footprint"] KWP["kWp estimate
area_m2 divided by 6"] OUT["per_building_solar_ncr.geojson
384 buildings, 69.9 MWp"] TILE --> SAM SAM --> COLOR COLOR -->|drops ~70% of masks| CROP CROP --> CEMB CEMB --> LR2 LR2 --> COMB COMB --> KEEP KEEP --> MATCH OSM2 --> MATCH MATCH --> MERGE MERGE --> KWP KWP --> OUT
fig 03 · v2.1 turns each high-confidence tile into one feature per OSM building with detected panels. SAM produces many candidate masks; the color filter is fast and removes most non-panel masks before CLIP scoring; the per-building merge collapses over-segmented arrays back to one polygon per building.
  1. SAM (Segment Anything, ViT-B checkpoint). For each high-confidence v3 tile we run SAM's automatic mask generator. SAM is class-agnostic: it carves the image into ~150-200 generic regions per tile (every roof slope, road, treeline becomes its own mask).
  2. Color signature filter. Solar panels in PH aerial imagery are dark with neutral-to-blue tone. We compute per-mask mean RGB and reject masks whose brightness or blue bias is incompatible with panels (light roofs, vegetation, asphalt, brightly-tarpaulined informal roofs all get filtered out before the next step).
  3. CLIP+LR scoring on 192 px context window. Each surviving mask is centered in a 192×192 px window (~76 m), embedded with CLIP-ViT-L, and scored by clf_v3. Final segment confidence is 0.6 × CLIP_score + 0.4 × color_score. We keep segments with combined confidence ≥ 0.70.
  4. OSM building intersection. Each kept segment polygon is matched against OSM building footprints around the tile (Overpass query with 200 m radius). The segment is assigned to the building containing its centroid (or, if outside any building, to the building with largest bbox overlap).
  5. Per-building merge. SAM frequently over-segments a single rooftop array into 8-30 pieces; without grouping the same SM Mall installation would emit 30 features. We group all segments by OSM building id, sum their panel area (capped at the building footprint), use the highest-confidence segment as the displayed polygon, and keep the max segment confidence as the building's score.
  6. kWp estimate. Industry rule of thumb: ~6 m² of crystalline-silicon panel per kWp installed. kwp_estimate = panel_area_m² / 6. This is approximate: efficiency varies, mounting density varies, and SAM's polygon often clips the array boundary by ±10%.

Output: site/public/data/per_building_solar_ncr.geojson, one Feature per building with detected panels. Properties include building_osm_id, building_type, panel_area_m2, kwp_estimate, confidence, n_segments_merged. The "your roof" tool does an osm_id lookup against this file: when the user's address matches a building that v2.1 found panels on, the result page surfaces a "your roof MAY already have solar" banner with the detection thumbnail and estimated kWp.

What this is not: a verified panel-area number. SAM's polygons are approximate (contour-of-a-mask, not a precise rectified panel outline); CLIP+LR was trained at tile scale and applied at segment scale; OSM building footprints in PH are uneven. Treat the kWp number as ±20–30%. The high-value claim is "there is solar on this specific OSM building"; the m² and kWp are useful framing, not audit-grade.

Honest limitations

References