Documentation

Create a key and try the API in seconds following the steps below.

Usage #

Simply make a GET request to the extract endpoint with an url encoded query parameter and an API Key. Not grabbed an API Key yet? Head over to your account page.

https://api.pagemunch.com/extract?apiKey=YOUR_API_KEY&url=https://www.instagram.com/p/BLXc4HsDoTu

Authentication #

All API requests must be made over SSL / HTTPS with an API Key. Requests made over HTTP will redirect. API requests without a key will result in an error response.

Authentication to the API can be performed using HTTP Basic Auth. Provide your API key as the basic auth username value. You do not need to provide a password:

curl https://api.pagemunch.com/extract?url=https://www.instagram.com/p/BLXc4HsDoTu \ -u YOUR_API_KEY:

Or, you may find it simpler to use a query parameter - your API Key can be provided as an apiKey parameter like so:

curl https://api.pagemunch.com/extract?apiKey=YOUR_API_KEY&url=https://www.instagram.com/p/BLXc4HsDoTu

Errors #

Errors will always include an error key with a title attribute. Errors will also return a non-200 HTTP status code so you can easily filter them in your application. An example:

{ error: true, title: 'An API key is required' }

Response #

  • type string

    Returns the type of resource at this URL, one of the following:
    • website
    • image
    • video
    • file
  • original_url url

    The url that was passed to the api. This could be a short link or if there is no redirect it will be the same as the url attribute.
  • url url

    The canonical URL if found, otherwise this will be the URL after all redirects are followed.
  • short_url url

    Sometimes the publisher provides a preferred or branded short URL for the resource.
  • provider_name string

    The name of the resource provider or content publisher, for example "TechCrunch".
  • provider_url url

    The URL of the resource provider or content publisher, for example "https://techcrunch.com".
  • title string

    The title of the resource.
  • description string

    The description of the resource.
  • authors array

    A list of all the authors that are associated with the resource, usually only available for articles. Each author may contain a url and a name.

      {
        "name": "Dave Lee",
        "url": "http://www.bbc.com/news/correspondents/davelee"
      }
    
  • tags array

    A list of tags that can be used to describe this resource. The list is parsed from meta tags and the document itself.
  • images array

    A list of images in order of those that best represent the resource.

      {
        "thumbnail_url": "http://ichef-1.bbci.co.uk/production/_92958661_whatsubject.jpg",
        "height": 576,
        "width": 1024,
        "type": "jpg"
      }
    
  • entities array

    A list of entities (organizations, locations and people) in order of appearance in the resource.

      {
        "type": "organization",
        "name": "Google",
        "count": 1
      }
    
  • labels array

    A list of labels that can include prices, genres, categories or other data that the resource chooses to highlight.

      {
        "name": "Reading time",
        "value": "10 minutes"
      }
    
  • file_type string

    If the resource is a file then this attribute will be a human readable description, for example "Microsoft Office Spreadsheet".
  • file_size integer

    If the resource is a file then this attribute will be the size in bytes of the entire file.
  • file_extension string

    If the resource is a file then this will be the file extension parsed from the document (not the url).
  • published_at datetime

    The date and time that the resource was published.
  • modified_at datetime

    The last time this resource was modified or updated by the provider, for example blog posts are often updated after they are first published.
  • accessed_at datetime

    The last time this resource was accessed by Pagemunch. This is often the moment of the request or upto any time in the prior 24 hours.
  • elapsed_time integer

    The amount of time in ms it took Pagemunch to retrieve and parse the request. Note: You may receive a response faster than this if the URL is already cached.