Skip to content

WordPress Importer — Improvement & Growth Plan

Package: capell-app/wordpress-importer · Kind: package · Tier: premium · Product group: Capell Operations · Bundle: operations · Status: Draft

WordPress Importer is a focused Migration Assistant source adapter: it registers one WxrReader (src/Services/WxrReader.php) into Migration Assistant’s ImportSourceRegistry (prepended ahead of MA’s own XmlReader) so .xml uploads are sniffed for a WordPress WXR namespace and parsed into MA’s neutral ExternalImportReadResult row shape. The package also ships a headless preview command (src/Console/Commands/ImportWordPressWxrCommand.php), translated Diagnostics probes (src/Health/WordpressImporterHealthCheck.php), and WXR preview mapping data/actions. It declares no migrations, no settings, no permissions, no Filament resources (capell.json database/settings/permissions all empty); the admin surface is still Migration Assistant’s import workflow, while the console surface is justified by the package-owned preview command. Relationship to migration-assistant: hard requires dependency; WordPress Importer is a pure consumer/extension of MA contracts (ImportSourceReader, ExternalImportReadResult, SafeXmlLoader, XmlReader) and owns none of the import session, final execution, or rollback machinery.

Current marketplace summary (verbatim): “Preview WordPress WXR posts and pages in Capell with durable metadata ready for Migration Assistant mapping.” Screenshot count in capell.json.marketplace.screenshots: 4 (extension-card.jpg, wordpress-wxr-source-selection.png, wordpress-wxr-preview.png, wordpress-import-session.png). The screenshot contract and marketplace manifest are reconciled: the three WXR workflow PNGs named by docs/screenshots.json exist on disk, and the manifest no longer promotes older static hero assets.

  1. Done/Shipped: Make the Migration Assistant registry path-aware for WXR selectionWxrReader::supportsPath() stream-sniffs readable XML paths and refuses generic XML, and MA’s ImportSourceRegistry now asks path-aware readers before extension fallback. Generic XML remains owned by Migration Assistant’s XmlReader, including extension-only export.xml lookups, so the WordPress adapter can no longer mask a missing generic XML reader — src/Services/WxrReader.php, src/Actions/BuildWordPressImportPreviewAction.php, migration-assistant registry — M.
  2. Done/Shipped: Keep non-WXR XML out of the WordPress readerWxrReader::read() now rejects non-WXR XML instead of building a generic XML fallback. This keeps the parser boundary clear: WordPress Importer parses WordPress exports only; Migration Assistant parses generic XML — src/Services/WxrReader.php, migration-assistant registry — S.
  3. Surface CDATA/excerpt edge cases as columns consistently — why: columnsFor() derives columns from the union of row keys (src/Services/WxrReader.php:147-150), but every row always has the same fixed keys from rowFromItem(), so the union logic is dead complexity; a static column list would be clearer and cheaper — src/Services/WxrReader.php — S.
  4. Done/Shipped: health check now asserts real runtime requirementsWordpressImporterHealthCheck::runDiagnostics() reports SimpleXML availability, WXR reader registration in Migration Assistant’s registry, and Migration Assistant reader contract compatibility; focused health coverage exists in tests/Unit/Health/WordpressImporterHealthCheckTest.phpsrc/Health/WordpressImporterHealthCheck.php — M.
  5. Done/Shipped: PHP/runtime metadata is aligned with the repo baseline — root composer.json, composer.local.json, and packages/wordpress-importer/composer.json all require php: ^8.3, matching the repo’s PHP 8.3 minimum; do not pin this package to ^8.4 unless the whole Capell 4 package baseline moves together — composer.json:7 — S.
  6. README “Built With” omits the hard ext-simplexml requirement from its dependency narrative while listing it elsewhere — minor doc consistency; ensure the parser dependency is stated once authoritatively — README.md / composer.json:8 — S.

The package advertises focused WXR reader/preview capabilities in capell.json and deliberately avoids claiming package-owned admin resources or final migration execution. Against WordPress-import norms:

  • Done/Shipped for preview/console: extracted WordPress fields now map into explicit meta.wordpress.* metadata. BuildWordPressImportPreviewAction owns the WXR-specific field mapping for the package’s headless preview/console path, so categories, tags, author login, media URLs, old permalinks, IDs, Gutenberg/shortcode signals, and related WordPress metadata no longer fall into generic meta.imported.*. This does not yet solve Migration Assistant execution, taxonomy/author resolution, media ingest, or admin mapping UI. Evidence: ImportWordPressWxrCommand delegates to the Action, and WxrReaderTest asserts explicit meta.wordpress.* output with no meta.imported fallback. — src/Actions/BuildWordPressImportPreviewAction.php, src/Data/WordPressImportPreviewData.php, src/Console/Commands/ImportWordPressWxrCommand.php, tests/Unit/WxrReaderTest.php
  • Done/Shipped for the core page execution path: external reader rows can now move beyond preview. Migration Assistant owns ExecuteExternalPageImportAction, which accepts an ExternalImportPreview, validates required Capell page defaults, converts create descriptors into a package-like in-memory payload, restores slug-derived page URLs and source parent relationships, executes through PageImportService, stores an ImportSession, and creates rollback reports even when a failed report created pages. WordPress Importer remains a source adapter: its WXR preview can be passed into the MA action with target page defaults to create routable Pages. Remaining gaps are taxonomy/author resolution, media ingest, redirects, idempotency, and admin wizard wiring.
  • Done/Shipped: Media import + content URL rewrite after execution. Completed Migration Assistant page imports now trigger ImportWordPressMediaForPagesAction, which downloads parsed meta.wordpress.media_urls, attaches them to the created Page’s wordpress-import media collection, records meta.wordpress.imported_media, and rewrites exact WordPress media URLs in stored meta.content to local media URLs. Failed downloads are skipped without blocking the page import. — src/Actions/ImportWordPressMediaForPagesAction.php, src/Listeners/ImportWordPressMediaForCompletedImport.php, tests/Unit/WxrReaderTest.php
  • Gutenberg blocks / shortcodes passed through verbatim. post_content is stored raw (src/Services/WxrReader.php:93); <!-- wp:* --> block comments and [shortcode] markup land untouched in meta.content. No converter to Capell components/blocks. Major differentiator.
  • Shipped 2026-06-08: permalink redirects, redirect reports, and rollback evidence are created after completed imports when URL Manager is available. Completed Migration Assistant imports now trigger a WordPress Importer listener that loads created Pages, reads meta.wordpress.old_permalink/link, resolves the imported Capell Page URL, and upserts exact URL Manager redirects with a translated note. The listener stores a wordpress_permalink_redirects report on the import session and rollback report, and newly-created redirect rule IDs are merged into created_models so rollback execution can remove URL changes alongside imported pages. The integration is optional at runtime, so WordPress Importer still works without URL Manager installed. — src/Actions/CreateWordPressPermalinkRedirectsAction.php, src/Data/WordPressPermalinkRedirectReportData.php, src/Listeners/CreateWordPressRedirectsForCompletedImport.php, src/Providers/WordPressImporterServiceProvider.php, capell.json
  • Custom post types dropped. read() hard-filters to ['page','post'] (src/Services/WxrReader.php:42); CPTs, attachments-as-posts, nav menus, and comments are skipped. No configurability.
  • Author mapping is a raw login string (dc:creator), not resolved to a Capell user. No mapping UI.
  • Comments not imported (norm for blog migrations; ties to the blog package).
  • Done/Shipped: console surface is justified. ImportWordPressWxrCommand produces a headless Migration Assistant preview JSON payload for scripted migration audits, and the package provider registers it through Spatie package tools.
  • Done/Shipped: manifest capabilities match package ownership. capell.json now declares WXR reader, preview, metadata, media-reference, Gutenberg/shortcode detection, and headless-preview capabilities instead of the old generic admin/console capability claims. The admin workflow remains Migration Assistant-owned while WordPress Importer contributes the source reader and preview metadata. — capell.json, docs/overview.md
  • Done/Shipped: health check severity now maps to real probes. See §2.4. The critical health check now fails when SimpleXML is unavailable, the WXR reader is not registered with Migration Assistant, or the reader no longer implements the expected Migration Assistant contract — src/Health/WordpressImporterHealthCheck.php, capell.json.
  • Done/Shipped: screenshot manifest coverage is reconciled. docs/screenshots.json names the source selection, WXR preview, and import session PNGs that now exist on disk, and capell.json.marketplace.screenshots includes those three workflow images plus the extension card and both hero assets. Add a manifest↔asset consistency test if future screenshot drift becomes common — docs/screenshots.json, capell.json.
  • Done/Shipped: large WXR exports stream beyond Migration Assistant’s DOM safety cap. WxrReader::read() now uses XMLReader when available, scans metadata, parent attachments, and post/page rows in streaming passes, rejects DOCTYPE declarations, and preserves item-level isolation while only loading the current <item> into SafeXmlLoader. Focused coverage builds a WXR fixture larger than SafeXmlLoader::DEFAULT_MAX_BYTES and proves it reads successfully. — src/Services/WxrReader.php, tests/Unit/WxrReaderTest.php
  • Shipped 2026-06-06: malformed post/page items are isolated. WxrReader::read() now catches item-level failures while parsing WordPress posts/pages, skips only the bad item, and records metadata.item_errors plus metadata.skipped_item_count on the read result so one malformed WXR item no longer aborts neighboring valid rows. Document-level XML/WXR failures still fail fast. — src/Services/WxrReader.php
  • DOCTYPE/XXE: covered upstream, not owned. Security rests entirely on MA’s SafeXmlLoader (DOCTYPE rejection, LIBXML_NONET, null entity loader). The package test rejects WordPress WXR imports with doctype declarations (tests/Unit/WxrReaderTest.php:95) asserts this, which is good — but if MA ever relaxes SafeXmlLoader, this package silently inherits the regression. Add a contract/regression test pinning the expectation.
  • Shipped 2026-06-06: preview de-duplicates previously imported WordPress source identities. BuildWordPressImportPreviewAction now applies ApplyWordPressPreviewIdempotencyAction after field mapping; rows whose meta.wordpress.source_identity already exists on a Capell Page are marked skip with a translated reason, so a repeated WXR preview no longer proposes duplicate Page creation for already imported WordPress posts/pages. Remaining resumability depth is job-level resume/retry state beyond Migration Assistant’s session retry support. — src/Actions/ApplyWordPressPreviewIdempotencyAction.php, src/Actions/BuildWordPressImportPreviewAction.php
  • Test coverage gaps. tests/Unit/WxrReaderTest.php covers registration, WP posts/pages read, generic XML staying with Migration Assistant’s XML reader, package-owned preview rejection for generic XML, DOCTYPE rejection, console preview JSON, explicit WordPress metadata mapping, and a WXR preview executing through ExecuteExternalPageImportAction into routable parent/child Pages, Page URLs, completed import session, media import, permalink redirect report, and rollback removal of created URL Manager redirects. tests/Unit/Health/WordpressImporterHealthCheckTest.php covers health diagnostics. Not covered: CPT filtering, multi-attachment items beyond the first inline URL plus parent attachments, empty/missing wp: namespace fallback path, oversize-file rejection, malformed item, category-vs-tag domain disambiguation under odd domain attrs, and manifest↔asset consistency.
  • Performance budget is nominal. capell.json.performance.adminQueryBudget: 40, frontendRenderBudgetMs: 0, cacheable:false — reasonable for a console/parse package, but there’s no test or benchmark enforcing parse-time on a representative export.
  • i18n. Health and command copy is translated, but reader exception strings (src/Services/WxrReader.php) are hardcoded English sprintfs — acceptable for developer-facing exceptions, but admin-surfaced errors should be translatable per Capell conventions.
  • Done/Shipped: CHANGELOG now records package work. CHANGELOG.md documents the June 3-4 copy, screenshot, command, WXR preview, health, and XML reader boundary changes; keep it current as execution/media/redirect work lands.

Done/Shipped copy refresh. Current marketplace and composer copy now consistently frames WordPress Importer as a Migration Assistant source with preview, preserved WordPress metadata, media import, URL Manager redirect preservation, redirect reporting, and rollback-safe handoff.

Reference 1-sentence summary for future execution/media expansion:

Migrate your WordPress site into Capell — import posts, pages, and media from a standard WXR export, preview every change, and roll back safely.

Reference 3–4 sentence description for future execution/media expansion:

WordPress Importer turns a standard WordPress WXR export into a guided Capell migration. It reads your posts and pages, maps WordPress fields onto Capell’s page schema, and hands them to Migration Assistant for preview, validation, mapping, and one-click rollback — so you see exactly what will change before anything is written. Built to extend rather than replace, it slots into the Operations bundle alongside Migration Assistant and pairs with URL Manager to preserve your old permalinks and SEO equity. The fastest, lowest-risk path off WordPress and onto Capell.

(Note: the description should only promise media import / URL preservation after §3 gaps are closed; today it would overstate.)

Done/Shipped screenshot/media reconciliation. docs/screenshots.json names the WXR source selection, parsed preview, and import session screenshots; all three PNGs are committed under docs/screenshots/. capell.json.marketplace.screenshots now declares those workflow screenshots alongside the extension card, so buyer-facing marketplace media matches the documented screenshot contract.

Pricing/tier/bundle positioning. Tier premium, bundle operations, license paid/first-party/priority support (capell.json.commercial). WordPress import is a top-of-funnel acquisition driver — it is often the reason a prospect evaluates Capell at all. Consider positioning the parser as a low-friction lead-in (even bundling read/preview free) and monetising the execution + media + redirect layer as the premium hook, maximising trial-to-paid conversion. Keep it inside the Operations bundle so it cross-sells the bundle.

Cross-sell (deps + Extension Suites). Hard dep on migration-assistant (the engine — always co-sold). Natural attach: url-manager (preserve WP permalinks → redirects; today link is parsed but unused — the integration is half-built), seo-suite (redirect opportunity reports already consume suggestedTargetUrl), media-library (download wp:attachment_url into Spatie media). Frame as a “WordPress Migration Suite.”

Differentiators / value props / target buyer. Differentiators (once §3 is delivered): Gutenberg-block→Capell-component conversion, permalink/redirect preservation, media ingest, author mapping. Value props: lower switching cost, no SEO loss, full preview + rollback. Target buyer: agencies and site owners migrating off WordPress; the developer evaluating Capell as a WP replacement.

Keywords/tags (8–12): wordpress, wxr, migration, importer, cms-migration, content-migration, posts, pages, gutenberg, permalinks, seo-redirects, capell. (Composer currently lists only 5: capell, cms, laravel, wordpress, migrationcomposer.json:38-44.)

ItemBucketEffortImpactSection ref
Done/Shipped for core execution: external WXR preview rows can execute into Pages through Migration Assistant sessions and rollback reports. Admin wizard wiring, media, redirects, and idempotency remain separate gaps.DoneLCritical§3
Done/Shipped for preview/console: Map WP categories/tags/author/media instead of dumping to meta.imported.*. Evidence: BuildWordPressImportPreviewAction applies package-owned WXR mappings into meta.wordpress.*, and WxrReaderTest covers categories, tags, author login, media URLs, featured media, and no meta.imported fallback.DoneMHigh§2.1, §3
Done/Shipped: Make Migration Assistant registry path-aware for WXR reader selectionDoneMHigh§2.1, §2.2
Done/Shipped: Reconcile screenshots (screenshots.jsoncapell.json) and add real imagesDoneSHigh§4, §5
Done/Shipped: Keep Migration Assistant naming consistent in description/docsDoneSMed§5
Done/Shipped: Implement real health-check probes (registry + ext-simplexml + reader contract)DoneMMed§2.4, §4
Done/Shipped: Confirm PHP/runtime metadata stays on the repo-wide ^8.3 baseline after copy/keyword refreshDoneSMed§2.5, §5
Done/Shipped: Media import downloads wp:attachment_url into Page media + rewrites content URLsDoneLHigh§3 — completed page imports now download parsed WordPress media URLs, attach them to wordpress-import, record source/local URL pairs, and rewrite stored meta.content away from old WordPress asset URLs.
Shipped 2026-06-08: Permalink → redirect preservation, redirect report, and rollback evidence via URL Manager integrationDoneMHigh§3, §5 — completed imports now upsert exact redirects from WordPress old permalinks to imported Capell page URLs when URL Manager is installed, store a session/rollback redirect report, and add newly-created redirect rules to rollback created_models.
Shipped 2026-06-06: Idempotent re-import preview de-dupes by WordPress source_identityDoneSMed§4 — repeated previews skip rows whose meta.wordpress.source_identity already exists on a Capell Page.
Shipped 2026-06-06: Per-item error isolation for malformed WXR items before preview buildDoneMMed§4 — malformed post/page items are skipped with read-result metadata while valid neighboring rows continue.
Done/Shipped: Streaming XMLReader parser for exports > 50MBDoneLMed§4 — WXR reads now stream metadata, attachments, and post/page rows in separate passes beyond Migration Assistant’s DOM safety cap while preserving item-level isolation.
Done/Shipped: Headless wordpress-importer:import console command justifies console surfaceDoneMMed§3, §4
Done/Shipped: Add Feature/Integration test: end-to-end WXR → Page outcomeDoneMHigh§4 — WxrReaderTest executes a WXR preview through ExecuteExternalPageImportAction and asserts created parent/child Pages, Page URLs, a completed session, and rollback report.
Gutenberg block + shortcode → Capell component conversionLaterLHigh (differentiator)§3, §5
Custom post types, comments, nav-menu import (configurable)LaterLMed§3
Done/Shipped: Downscope admin/console capabilities to remove manifest mismatchDoneSMed§4