- Published on
Migrate from a CSV to content entities with Paragraphs
- Authors
- Name
- Christophe Jossart
- @colorfield
This article will explain how to use migration templates with a CSV that contains Paragraphs data on several lines.
For Paragraphs we could have this first structure, inline: this case is covered by this excellent article Migration of CSV Data into Paragraphs.
ID | Host entity title | Paragraph1 field1 | Paragraph1 field2 | Paragraph2 field1 | Paragraph2 field2 |
---|---|---|---|---|---|
1 | Jimi Hendrix | Axis: Bold as Love | https://www.deezer.com/fr/album/454044 | Live At The Fillmore East | https://www.deezer.com/fr/album/454045 |
2 | The Doors | Strange Days | https://www.deezer.com/fr/album/340880 | L.A. Woman | https://www.deezer.com/fr/album/6415260 |
For our case, we will assume that our Paragraphs information are separated on several lines, so the structure is more looking like that:
ID | Host entity title | Paragraph field 1 | Paragraph field 2 |
---|---|---|---|
1 | Jimi Hendrix | Axis: Bold as Love | https://www.deezer.com/fr/album/454044 |
2 | Jimi Hendrix | Live At The Fillmore East | https://www.deezer.com/fr/album/454045 |
3 | The Doors | Strange Days | https://www.deezer.com/fr/album/340880 |
4 | The Doors | L.A. Woman | https://www.deezer.com/fr/album/6415260 |
We may say that the first structure seems ok to cover most use cases, but if we extend the discography example with more Albums or with Tracks migration, it could not fit so well. The second one will be more readable, especially if this list needs a round of manual edit/review before import.
We assume here we want to add a list of Albums with Tracks.
So our CSV file looks like:
id,album_title,track_title,track_url
1,Axis: Bold As Love,Exp,https://www.deezer.com/fr/track/4952828
2,Axis: Bold As Love,Up From The Skies,https://www.deezer.com/fr/track/4952829
3,Axis: Bold As Love,Spanish Castle Magic,https://www.deezer.com/fr/track/4952830
4,Axis: Bold As Love,Wait Until Tomorrow,https://www.deezer.com/fr/track/4952832
5,Axis: Bold As Love,Aint No Telling,https://www.deezer.com/fr/track/4952831
...
And we have this Drupal model:
Album media
- Track (Paragraphs)
- Name
- (...)
Track paragraph
- Link
- Title
- (...)
First thought: we might use a custom process plugin. This is not the best approach here because the migration will happen in two steps: first, the Tracks paragraphs then the Albums media.
So, it might lead to a second file creation, for the Albums, and we want to avoid this.
Second approach: re-use the same CSV for the Albums, but transform it with a data parser.
We will still use the Migrate Source CSV module to create the Tracks Paragraphs in a first template, as the original structure perfectly matches our use case.
migrate_plus.migration.track_paragraphs.yml
id: track_paragraphs
label: Track Paragraphs
migration_group: discography
source:
plugin: csv
path: modules/custom/migrate_discography/data/album_tracks.csv
header_row_count: 1
keys:
- id
process:
field_title: track_title
field_link:
plugin: urlencode
source: track_url
destination:
plugin: entity_reference_revisions:paragraph
default_bundle: track
migration_dependencies:
required: {}
optional: {}
dependencies:
enforced:
module:
- migrate_discography
Then, with a data parser, we will
- Dedupe the entity id's to create one Media per album id
- Change the structure so we can provide associative arrays to match what the Migrate Plus template expects.
We will extend the JSON data parser from Migrate Plus for that.
migrate_plus.migration.album_media.yml
id: album_media
label: Album Media
migration_group: discography
source:
plugin: url
data_fetcher_plugin: file
# Make use of a custom parser here, to convert the CSV
# into associative arrays.
data_parser_plugin: album_parser
track_changes: true
urls: modules/custom/migrate_discography/data/album_tracks.csv
item_selector: /albums
fields:
- name: album_title
label: Album title
selector: album_title
- # This field does not exist as is in the CSV
# and is provided by the data parser.
name: tracks
label: Tracks
selector: tracks
ids:
album_title:
type: string
process:
# Media name.
name: album_title
# Paragraphs field.
field_tracks:
plugin: sub_process
source: tracks
process:
temporary_ids:
plugin: migration_lookup
migration: track_paragraphs
# The id is the one from the CSV,
# used to get the right paragraph.
source: id
target_id:
plugin: extract
source: '@temporary_ids'
index:
- 0
target_revision_id:
plugin: extract
source: '@temporary_ids'
index:
- 1
destination:
plugin: entity:media
default_bundle: album
migration_dependencies:
required:
- track_paragraphs
optional: {}
dependencies:
enforced:
module:
- migrate_discography
AlbumParser.php
<?php
namespace Drupal\migrate_discography\Plugin\migrate_plus\data_parser;
use Drupal\migrate_plus\Plugin\migrate_plus\data_parser\Json;
/**
* Builds relations between Albums and Tracks
* and dedupes Album entities from a flat CSV.
* Then delegates to the Json data parser for the selectors.
*
* @DataParser(
* id = "album_parser",
* title = @Translation("Album parser")
* )
*/
class AlbumParser extends Json {
/**
* {@inheritdoc}
*/
protected function getSourceData($url) {
// Get the CSV.
$response = $this->getDataFetcherPlugin()->getResponseContent($url);
// Convert the flat CSV into associative arrays.
// 0 = Id
// 1 = Album title
// 2 = Track title
// 3 = Track url
$source_data = [
'albums' => [],
];
$lines = explode("\n", $response);
// Exclude the first (header) row. Could be moved in config.
array_shift($lines);
$albumDetails = [];
foreach ($lines as $line) {
$csvLine = str_getcsv($line);
if (!empty($csvLine[1])) {
if (!array_key_exists($csvLine[1], $albumDetails)) {
$albumDetails[$csvLine[1]] = [
'album_title' => $csvLine[1],
'tracks' => [],
];
}
$albumDetails[$csvLine[1]]['tracks'][] = [
'id' => $csvLine[0],
];
}
}
// In two times, to avoid key indexed results by product id.
foreach ($albumDetails as $albumDetail) {
$source_data['albums'][] = $albumDetail;
}
// Section from parent class.
// Backwards-compatibility for depth selection.
if (is_int($this->itemSelector)) {
return $this->selectByDepth($source_data);
}
// Otherwise, we're using xpath-like selectors.
$selectors = explode('/', trim($this->itemSelector, '/'));
foreach ($selectors as $selector) {
if (!empty($selector)) {
$source_data = $source_data[$selector];
}
}
return $source_data;
}
}
Then we can check the status.
and import it
Here is the repository containing this example.