Bulk Editing Posts at WordPress.com with the REST API

A little while ago I migrated my personal blog over to WordPress.com – and didn’t notice for quite some time that there were some issues in the body text of some of the older posts (the blog has several thousand posts). If the blog had been hosted on my own server, I could have just written a script to do a database update on the content, but it is hosted at wordpress.com – so that wasn’t an option.

I had a play with the WordPress REST API, and am happy to report that it allowed me to not only load all of the posts from my blog via a script, but also update them.

The script below is purely a guide – it will not work “out of the box”, as you will see if you read the various notes. It’s a template you can fashion to do what you want by adding the various pieces together. In my “real” version, all of the snippets are in one script, one after another.

Oh – and finally – worth noting that this is PHP, and I ran it at the command line in a virtual machine running Ubuntu Server 16.x, spun up at Digital Ocean, and then destroyed afterwards. It cost pennies for the time it existed. The only installs I had to do on the VM were PHP 7, and PHP CURL. There would be nothing to stop you converting it into a PHP script running in a browser, except you would probably hit time-outs. The nice thing about running it at the command line is you get to see progress as it runs.

Get an Access Token

Although some methods of the WordPress API (such as retrieving sites, and posts) require no authentication, we will be calling update later – so will need to get an access token. To do this you have to configure an application at developer.wordpress.com/apps, which will give you a Client ID, and a Client Secret string (the snippet below should be self explanatory).

$client_id = '...';
$client_secret = '...';
$site_url = 'your_blog_name.wordpress.com';
$username = '...';
$password = '...';

// get an access token
$curl = curl_init( 'https://public-api.wordpress.com/oauth2/token' );
curl_setopt( $curl, CURLOPT_POST, true );
curl_setopt( $curl, CURLOPT_POSTFIELDS, array(
    'client_id' => $client_id,
    'client_secret' => $client_secret,
    'grant_type' => 'password',
    'username' => $username,
    'password' => $password,
) );
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1);
$auth = curl_exec( $curl );
$auth = json_decode($auth);
$access_token = $auth->access_token;

print "Access Token [".$access_token."]\r\n\r\n";

Get Site Information

The REST API call to retrieve posts needs the internal WordPress ID of your site – to get this you need to call the Sites API.

// get site info
$site_options = array (
    'http' =>
    array (
    'ignore_errors' => true,
    ),
);
$site_context = stream_context_create( $site_options );
$site_response = file_get_contents(
    'https://public-api.wordpress.com/rest/v1.2/sites/'.$site_url.'/',
    false,
    $site_context
);
$site_response = json_decode( $site_response );
$site_id = $site_response->ID;

Retrieve the Posts and Update Them

To get hold of the posts from the blog, we need to repeatedly call the posts API, with a number of parameters – essentially the number of posts to grab in each iteration, and the number of pages to try and loop through. There are a number of ways of iterating the pages – I have gone with a very hacky way that suited my needs – you could be far more clever, and use the page_handle data that comes back with the response data.

// configuration parameters
$posts_per_page = 20;
$pages = 200;
$search_pattern = "..."; // the pattern to identify content within a post that needs updating
$replace_search_pattern = "..."; // the replacement search pattern (regex)
$replace_pattern = "..."; // the replacement pattern (regex)

// setup the post context
$posts_options = array ( 'http' => array ('ignore_errors' => true, ),);
$posts_context = stream_context_create( $posts_options );

// loop through the pages
for ($page=1; $page<$pages; $page++)
{
    $posts_url = 'https://public-api.wordpress.com/rest/v1.1/sites/'.$site_url.'/posts/?page='.$page.'&number='.$posts_per_page .'&fields=ID,title,content';
    $posts_response = file_get_contents( $posts_url, false, $posts_context);
    $posts_response = json_decode( $posts_response );
    for ($i=0; $iposts);$i++) {
        $post = $posts_response->posts[$i];
        print " - ".$post->ID." ".$post->title;

        // does the post have a pattern match in it ?
        $match_result = preg_match($search_pattern,$post->content);
        if ($match_result > 0) {
            print " MATCH FOUND";
            $post_id = $post->ID;
            $updated_content = preg_replace($replace_search_pattern, $replace_pattern, $post->content);

            print "\r\n\r\n".$updated_content."\r\n\r\n";

            // do the update
            $update_options = array (
                'http' => array (
                    'ignore_errors' => true,
                    'method' => 'POST',
                    'header' => array (
                        0 => 'authorization: Bearer '.$access_token,
                        1 => 'Content-Type: application/x-www-form-urlencoded',
                    ),
                'content' => http_build_query( array (
                    'content' => $updated_content,
                    )),
                ),
            );

            $update_context = stream_context_create( $update_options );
            $update_response = file_get_contents('https://public-api.wordpress.com/rest/v1.2/sites/'.$site_id.'/posts/'.$post_id,false,$update_context);
            $update_response = json_decode( $update_response );

            print " UPDATED";
        }

        print "\r\n";
    }
}

It’s a little bit technical in places, but most of this code was lifted from the WordPress API documentation. As I said at the start – this is not a working solution that you can just paste in – it’s a guide to how you can interract with the WordPress.com API from PHP. Hopefully it will be useful to somebody else at some point.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.