# WordPress Importer — Improvement & Growth Plan

> Package: capell-app/wordpress-importer · Kind: package · Tier: premium · Product group: Capell Operations · Bundle: operations · Status: Draft

## 1. Snapshot

WordPress Importer is a focused Migration Assistant source adapter: it registers one `WxrReader` (`src/Services/WxrReader.php`) into Migration Assistant's `ImportSourceRegistry` (prepended ahead of MA's own `XmlReader`) so `.xml` uploads are sniffed for a WordPress WXR namespace and parsed into MA's neutral `ExternalImportReadResult` row shape. The package also ships a headless preview command (`src/Console/Commands/ImportWordPressWxrCommand.php`), translated Diagnostics probes (`src/Health/WordpressImporterHealthCheck.php`), and WXR preview mapping data/actions. It declares **no migrations, no settings, no permissions, no Filament resources** (`capell.json` `database`/`settings`/`permissions` all empty); the admin surface is still Migration Assistant's import workflow, while the console surface is justified by the package-owned preview command. Relationship to migration-assistant: hard `requires` dependency; WordPress Importer is a pure **consumer/extension** of MA contracts (`ImportSourceReader`, `ExternalImportReadResult`, `SafeXmlLoader`, `XmlReader`) and owns none of the import session, final execution, or rollback machinery.

Current marketplace summary (verbatim): _"Preview WordPress WXR posts and pages in Capell with durable metadata ready for Migration Assistant mapping."_ Screenshot count in `capell.json.marketplace.screenshots`: **4** (`extension-card.jpg`, `wordpress-wxr-source-selection.png`, `wordpress-wxr-preview.png`, `wordpress-import-session.png`). The screenshot contract and marketplace manifest are reconciled: the three WXR workflow PNGs named by `docs/screenshots.json` exist on disk, and the manifest no longer promotes older static hero assets.

## 2. Improvements (existing functionality)

1. **Done/Shipped: Make the Migration Assistant registry path-aware for WXR selection** — `WxrReader::supportsPath()` stream-sniffs readable XML paths and refuses generic XML, and MA's `ImportSourceRegistry` now asks path-aware readers before extension fallback. Generic XML remains owned by Migration Assistant's `XmlReader`, including extension-only `export.xml` lookups, so the WordPress adapter can no longer mask a missing generic XML reader — `src/Services/WxrReader.php`, `src/Actions/BuildWordPressImportPreviewAction.php`, migration-assistant registry — M.
2. **Done/Shipped: Keep non-WXR XML out of the WordPress reader** — `WxrReader::read()` now rejects non-WXR XML instead of building a generic XML fallback. This keeps the parser boundary clear: WordPress Importer parses WordPress exports only; Migration Assistant parses generic XML — `src/Services/WxrReader.php`, migration-assistant registry — S.
3. **Surface CDATA/excerpt edge cases as columns consistently** — why: `columnsFor()` derives columns from the union of row keys (`src/Services/WxrReader.php:147-150`), but every row always has the same fixed keys from `rowFromItem()`, so the union logic is dead complexity; a static column list would be clearer and cheaper — `src/Services/WxrReader.php` — S.
4. **Done/Shipped: health check now asserts real runtime requirements** — `WordpressImporterHealthCheck::runDiagnostics()` reports SimpleXML availability, WXR reader registration in Migration Assistant's registry, and Migration Assistant reader contract compatibility; focused health coverage exists in `tests/Unit/Health/WordpressImporterHealthCheckTest.php` — `src/Health/WordpressImporterHealthCheck.php` — M.
5. **Done/Shipped: PHP/runtime metadata is aligned with the repo baseline** — root `composer.json`, `composer.local.json`, and `packages/wordpress-importer/composer.json` all require `php: ^8.3`, matching the repo's PHP 8.3 minimum; do not pin this package to `^8.4` unless the whole Capell 4 package baseline moves together — `composer.json:7` — S.
6. **README "Built With" omits the hard `ext-simplexml` requirement from its dependency narrative while listing it elsewhere** — minor doc consistency; ensure the parser dependency is stated once authoritatively — `README.md` / `composer.json:8` — S.

## 3. Missing Features (gaps)

The package advertises focused WXR reader/preview capabilities in `capell.json` and deliberately avoids claiming package-owned admin resources or final migration execution. Against WordPress-import norms:

- **Done/Shipped for preview/console: extracted WordPress fields now map into explicit `meta.wordpress.*` metadata.** `BuildWordPressImportPreviewAction` owns the WXR-specific field mapping for the package's headless preview/console path, so categories, tags, author login, media URLs, old permalinks, IDs, Gutenberg/shortcode signals, and related WordPress metadata no longer fall into generic `meta.imported.*`. This does not yet solve Migration Assistant execution, taxonomy/author resolution, media ingest, or admin mapping UI. Evidence: `ImportWordPressWxrCommand` delegates to the Action, and `WxrReaderTest` asserts explicit `meta.wordpress.*` output with no `meta.imported` fallback. — `src/Actions/BuildWordPressImportPreviewAction.php`, `src/Data/WordPressImportPreviewData.php`, `src/Console/Commands/ImportWordPressWxrCommand.php`, `tests/Unit/WxrReaderTest.php`
- **Done/Shipped for the core page execution path: external reader rows can now move beyond preview.** Migration Assistant owns `ExecuteExternalPageImportAction`, which accepts an `ExternalImportPreview`, validates required Capell page defaults, converts create descriptors into a package-like in-memory payload, restores slug-derived page URLs and source parent relationships, executes through `PageImportService`, stores an `ImportSession`, and creates rollback reports even when a failed report created pages. WordPress Importer remains a source adapter: its WXR preview can be passed into the MA action with target page defaults to create routable Pages. Remaining gaps are taxonomy/author resolution, media ingest, redirects, idempotency, and admin wizard wiring.
- **Done/Shipped: Media import + content URL rewrite after execution.** Completed Migration Assistant page imports now trigger `ImportWordPressMediaForPagesAction`, which downloads parsed `meta.wordpress.media_urls`, attaches them to the created Page's `wordpress-import` media collection, records `meta.wordpress.imported_media`, and rewrites exact WordPress media URLs in stored `meta.content` to local media URLs. Failed downloads are skipped without blocking the page import. — `src/Actions/ImportWordPressMediaForPagesAction.php`, `src/Listeners/ImportWordPressMediaForCompletedImport.php`, `tests/Unit/WxrReaderTest.php`
- **Gutenberg blocks / shortcodes passed through verbatim.** `post_content` is stored raw (`src/Services/WxrReader.php:93`); `<!-- wp:* -->` block comments and `[shortcode]` markup land untouched in `meta.content`. No converter to Capell components/blocks. Major differentiator.
- **Shipped 2026-06-08: permalink redirects, redirect reports, and rollback evidence are created after completed imports when URL Manager is available.** Completed Migration Assistant imports now trigger a WordPress Importer listener that loads created Pages, reads `meta.wordpress.old_permalink`/`link`, resolves the imported Capell Page URL, and upserts exact URL Manager redirects with a translated note. The listener stores a `wordpress_permalink_redirects` report on the import session and rollback report, and newly-created redirect rule IDs are merged into `created_models` so rollback execution can remove URL changes alongside imported pages. The integration is optional at runtime, so WordPress Importer still works without URL Manager installed. — `src/Actions/CreateWordPressPermalinkRedirectsAction.php`, `src/Data/WordPressPermalinkRedirectReportData.php`, `src/Listeners/CreateWordPressRedirectsForCompletedImport.php`, `src/Providers/WordPressImporterServiceProvider.php`, `capell.json`
- **Custom post types dropped.** `read()` hard-filters to `['page','post']` (`src/Services/WxrReader.php:42`); CPTs, attachments-as-posts, nav menus, and comments are skipped. No configurability.
- **Author mapping is a raw login string** (`dc:creator`), not resolved to a Capell user. No mapping UI.
- **Comments not imported** (norm for blog migrations; ties to the blog package).
- **Done/Shipped: console surface is justified.** `ImportWordPressWxrCommand` produces a headless Migration Assistant preview JSON payload for scripted migration audits, and the package provider registers it through Spatie package tools.

## 4. Issues / Risks

- **Done/Shipped: manifest capabilities match package ownership.** `capell.json` now declares WXR reader, preview, metadata, media-reference, Gutenberg/shortcode detection, and headless-preview capabilities instead of the old generic admin/console capability claims. The admin workflow remains Migration Assistant-owned while WordPress Importer contributes the source reader and preview metadata. — `capell.json`, `docs/overview.md`
- **Done/Shipped: health check severity now maps to real probes.** See §2.4. The critical health check now fails when SimpleXML is unavailable, the WXR reader is not registered with Migration Assistant, or the reader no longer implements the expected Migration Assistant contract — `src/Health/WordpressImporterHealthCheck.php`, `capell.json`.
- **Done/Shipped: screenshot manifest coverage is reconciled.** `docs/screenshots.json` names the source selection, WXR preview, and import session PNGs that now exist on disk, and `capell.json.marketplace.screenshots` includes those three workflow images plus the extension card and both hero assets. Add a manifest↔asset consistency test if future screenshot drift becomes common — `docs/screenshots.json`, `capell.json`.
- **Done/Shipped: large WXR exports stream beyond Migration Assistant's DOM safety cap.** `WxrReader::read()` now uses `XMLReader` when available, scans metadata, parent attachments, and post/page rows in streaming passes, rejects DOCTYPE declarations, and preserves item-level isolation while only loading the current `<item>` into `SafeXmlLoader`. Focused coverage builds a WXR fixture larger than `SafeXmlLoader::DEFAULT_MAX_BYTES` and proves it reads successfully. — `src/Services/WxrReader.php`, `tests/Unit/WxrReaderTest.php`
- **Shipped 2026-06-06: malformed post/page items are isolated.** `WxrReader::read()` now catches item-level failures while parsing WordPress posts/pages, skips only the bad item, and records `metadata.item_errors` plus `metadata.skipped_item_count` on the read result so one malformed WXR item no longer aborts neighboring valid rows. Document-level XML/WXR failures still fail fast. — `src/Services/WxrReader.php`
- **DOCTYPE/XXE: covered upstream, not owned.** Security rests entirely on MA's `SafeXmlLoader` (DOCTYPE rejection, `LIBXML_NONET`, null entity loader). The package test `rejects WordPress WXR imports with doctype declarations` (`tests/Unit/WxrReaderTest.php:95`) asserts this, which is good — but if MA ever relaxes `SafeXmlLoader`, this package silently inherits the regression. Add a contract/regression test pinning the expectation.
- **Shipped 2026-06-06: preview de-duplicates previously imported WordPress source identities.** `BuildWordPressImportPreviewAction` now applies `ApplyWordPressPreviewIdempotencyAction` after field mapping; rows whose `meta.wordpress.source_identity` already exists on a Capell Page are marked `skip` with a translated reason, so a repeated WXR preview no longer proposes duplicate Page creation for already imported WordPress posts/pages. Remaining resumability depth is job-level resume/retry state beyond Migration Assistant's session retry support. — `src/Actions/ApplyWordPressPreviewIdempotencyAction.php`, `src/Actions/BuildWordPressImportPreviewAction.php`
- **Test coverage gaps.** `tests/Unit/WxrReaderTest.php` covers registration, WP posts/pages read, generic XML staying with Migration Assistant's XML reader, package-owned preview rejection for generic XML, DOCTYPE rejection, console preview JSON, explicit WordPress metadata mapping, and a WXR preview executing through `ExecuteExternalPageImportAction` into routable parent/child Pages, Page URLs, completed import session, media import, permalink redirect report, and rollback removal of created URL Manager redirects. `tests/Unit/Health/WordpressImporterHealthCheckTest.php` covers health diagnostics. **Not covered:** CPT filtering, multi-attachment items beyond the first inline URL plus parent attachments, empty/missing `wp:` namespace fallback path, oversize-file rejection, malformed item, category-vs-tag domain disambiguation under odd `domain` attrs, and manifest↔asset consistency.
- **Performance budget is nominal.** `capell.json.performance.adminQueryBudget: 40`, `frontendRenderBudgetMs: 0`, `cacheable:false` — reasonable for a console/parse package, but there's no test or benchmark enforcing parse-time on a representative export.
- **i18n.** Health and command copy is translated, but reader exception strings (`src/Services/WxrReader.php`) are hardcoded English `sprintf`s — acceptable for developer-facing exceptions, but admin-surfaced errors should be translatable per Capell conventions.
- **Done/Shipped: CHANGELOG now records package work.** `CHANGELOG.md` documents the June 3-4 copy, screenshot, command, WXR preview, health, and XML reader boundary changes; keep it current as execution/media/redirect work lands.

## 5. Marketplace & Selling

**Done/Shipped copy refresh.** Current marketplace and composer copy now consistently frames WordPress Importer as a Migration Assistant source with preview, preserved WordPress metadata, media import, URL Manager redirect preservation, redirect reporting, and rollback-safe handoff.

**Reference 1-sentence summary for future execution/media expansion:**

> Migrate your WordPress site into Capell — import posts, pages, and media from a standard WXR export, preview every change, and roll back safely.

**Reference 3–4 sentence description for future execution/media expansion:**

> WordPress Importer turns a standard WordPress WXR export into a guided Capell migration. It reads your posts and pages, maps WordPress fields onto Capell's page schema, and hands them to Migration Assistant for preview, validation, mapping, and one-click rollback — so you see exactly what will change before anything is written. Built to extend rather than replace, it slots into the Operations bundle alongside Migration Assistant and pairs with URL Manager to preserve your old permalinks and SEO equity. The fastest, lowest-risk path off WordPress and onto Capell.

(Note: the description should only promise media import / URL preservation **after** §3 gaps are closed; today it would overstate.)

**Done/Shipped screenshot/media reconciliation.** `docs/screenshots.json` names the WXR source selection, parsed preview, and import session screenshots; all three PNGs are committed under `docs/screenshots/`. `capell.json.marketplace.screenshots` now declares those workflow screenshots alongside the extension card, so buyer-facing marketplace media matches the documented screenshot contract.

**Pricing/tier/bundle positioning.** Tier `premium`, bundle `operations`, license `paid`/`first-party`/priority support (`capell.json.commercial`). WordPress import is a **top-of-funnel acquisition driver** — it is often the reason a prospect evaluates Capell at all. Consider positioning the _parser_ as a low-friction lead-in (even bundling read/preview free) and monetising the _execution + media + redirect_ layer as the premium hook, maximising trial-to-paid conversion. Keep it inside the Operations bundle so it cross-sells the bundle.

**Cross-sell (deps + Extension Suites).** Hard dep on **migration-assistant** (the engine — always co-sold). Natural attach: **url-manager** (preserve WP permalinks → redirects; today `link` is parsed but unused — the integration is half-built), **seo-suite** (redirect opportunity reports already consume `suggestedTargetUrl`), **media-library** (download `wp:attachment_url` into Spatie media). Frame as a "WordPress Migration Suite."

**Differentiators / value props / target buyer.** Differentiators (once §3 is delivered): Gutenberg-block→Capell-component conversion, permalink/redirect preservation, media ingest, author mapping. Value props: lower switching cost, no SEO loss, full preview + rollback. Target buyer: agencies and site owners migrating off WordPress; the developer evaluating Capell as a WP replacement.

**Keywords/tags (8–12):** wordpress, wxr, migration, importer, cms-migration, content-migration, posts, pages, gutenberg, permalinks, seo-redirects, capell. (Composer currently lists only 5: `capell`, `cms`, `laravel`, `wordpress`, `migration` — `composer.json:38-44`.)

## 6. Prioritized Roadmap

| Item                                                                                                                                                                                                                                                                                                                                             | Bucket | Effort | Impact                | Section ref                                                                                                                                                                                                                                                  |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | ------ | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Done/Shipped for core execution: external WXR preview rows can execute into Pages through Migration Assistant sessions and rollback reports. Admin wizard wiring, media, redirects, and idempotency remain separate gaps.                                                                                                                        | Done   | L      | Critical              | §3                                                                                                                                                                                                                                                           |
| Done/Shipped for preview/console: Map WP categories/tags/author/media instead of dumping to `meta.imported.*`. Evidence: `BuildWordPressImportPreviewAction` applies package-owned WXR mappings into `meta.wordpress.*`, and `WxrReaderTest` covers categories, tags, author login, media URLs, featured media, and no `meta.imported` fallback. | Done   | M      | High                  | §2.1, §3                                                                                                                                                                                                                                                     |
| Done/Shipped: Make Migration Assistant registry path-aware for WXR reader selection                                                                                                                                                                                                                                                              | Done   | M      | High                  | §2.1, §2.2                                                                                                                                                                                                                                                   |
| Done/Shipped: Reconcile screenshots (`screenshots.json` ↔ `capell.json`) and add real images                                                                                                                                                                                                                                                     | Done   | S      | High                  | §4, §5                                                                                                                                                                                                                                                       |
| Done/Shipped: Keep Migration Assistant naming consistent in description/docs                                                                                                                                                                                                                                                                     | Done   | S      | Med                   | §5                                                                                                                                                                                                                                                           |
| Done/Shipped: Implement real health-check probes (registry + ext-simplexml + reader contract)                                                                                                                                                                                                                                                    | Done   | M      | Med                   | §2.4, §4                                                                                                                                                                                                                                                     |
| Done/Shipped: Confirm PHP/runtime metadata stays on the repo-wide `^8.3` baseline after copy/keyword refresh                                                                                                                                                                                                                                     | Done   | S      | Med                   | §2.5, §5                                                                                                                                                                                                                                                     |
| Done/Shipped: Media import downloads `wp:attachment_url` into Page media + rewrites content URLs                                                                                                                                                                                                                                                 | Done   | L      | High                  | §3 — completed page imports now download parsed WordPress media URLs, attach them to `wordpress-import`, record source/local URL pairs, and rewrite stored `meta.content` away from old WordPress asset URLs.                                                |
| Shipped 2026-06-08: Permalink → redirect preservation, redirect report, and rollback evidence via URL Manager integration                                                                                                                                                                                                                        | Done   | M      | High                  | §3, §5 — completed imports now upsert exact redirects from WordPress old permalinks to imported Capell page URLs when URL Manager is installed, store a session/rollback redirect report, and add newly-created redirect rules to rollback `created_models`. |
| Shipped 2026-06-06: Idempotent re-import preview de-dupes by WordPress `source_identity`                                                                                                                                                                                                                                                         | Done   | S      | Med                   | §4 — repeated previews skip rows whose `meta.wordpress.source_identity` already exists on a Capell Page.                                                                                                                                                     |
| Shipped 2026-06-06: Per-item error isolation for malformed WXR items before preview build                                                                                                                                                                                                                                                        | Done   | M      | Med                   | §4 — malformed post/page items are skipped with read-result metadata while valid neighboring rows continue.                                                                                                                                                  |
| Done/Shipped: Streaming `XMLReader` parser for exports > 50MB                                                                                                                                                                                                                                                                                    | Done   | L      | Med                   | §4 — WXR reads now stream metadata, attachments, and post/page rows in separate passes beyond Migration Assistant's DOM safety cap while preserving item-level isolation.                                                                                    |
| Done/Shipped: Headless `wordpress-importer:import` console command justifies console surface                                                                                                                                                                                                                                                     | Done   | M      | Med                   | §3, §4                                                                                                                                                                                                                                                       |
| Done/Shipped: Add Feature/Integration test: end-to-end WXR → Page outcome                                                                                                                                                                                                                                                                        | Done   | M      | High                  | §4 — `WxrReaderTest` executes a WXR preview through `ExecuteExternalPageImportAction` and asserts created parent/child Pages, Page URLs, a completed session, and rollback report.                                                                           |
| Gutenberg block + shortcode → Capell component conversion                                                                                                                                                                                                                                                                                        | Later  | L      | High (differentiator) | §3, §5                                                                                                                                                                                                                                                       |
| Custom post types, comments, nav-menu import (configurable)                                                                                                                                                                                                                                                                                      | Later  | L      | Med                   | §3                                                                                                                                                                                                                                                           |
| Done/Shipped: Downscope `admin`/`console` capabilities to remove manifest mismatch                                                                                                                                                                                                                                                               | Done   | S      | Med                   | §4                                                                                                                                                                                                                                                           |