How to extracted an embedded video into your code Python

How to extracted an embedded video into your code Python

Learn how I extracted an embedded video from Yahoo and put it into code.

Below, I show you how I extracted the embedded video source, along with a code example in Python. All right, pick a Yahoo article, and let’s dig!

HTML

As many embedded video extraction tutorials point out, finding the embedded video source through a browser’s web inspector is relatively easy. However, in order to build an efficient extractor that covers all Yahoo! Japan News articles, we need a clear, reproducible path from the original URL to the video source.

First, the obvious: I access an article and search for video-related extensions like “mp4” and “m3u8” in the source code.

Article source code

Nada. Oftentimes, the word “player”is associated with videos, so let’s see if that works.

Article source code

Aha! An external Javascript script, _embed.js_. Note the _contentid_ and _spaceid_ values in the parameters sent to _embed.js_. They look useful. Now, let’s check what’s inside.

embed.js

The code seems to reference another script, player.js and includes a parameter, the current UNIX timestamp converted to hours.

Network Inspection

Let’s take a look inside player.js with Google Chrome DevTools.

player.js

player.js is huge and scary looking and doesn’t contain any useful mp4 or [m3u8](https://www.lifewire.com/m3u8-file-2621956) urls either.

Okay, let’s work backwards and search for mp4 in requests that were made when we loaded the page (you may want to reload the page).

JSON response from https://feapi-yvpub.yahooapis.jp/v1/content/

Bingo! A JSON response with our m3u8 and mp4 sources. This response is generated by a request we made to https://feapi-yvpub.yahooapis.jp/v1/content/ with the following parameters:

What’s the 1602163 in …/v1/content/1602163? That’s the value of contentid we noted earlier. And space_id here matches our spaceid. Nice. What about appid? Let’s see if its value is mentioned in any other responses.

player.script.js

There it is, hard coded in player.script.js! And a quick look at other news articles confirms that this value is used for all embedded video requests. Three values down. Now, let’s search for ak’s value.

Unfortunately, the ak value is nowhere else to be found but in this JSON query. Where is ak coming from?

Breakpoints

I have a hunch that ak might be a Javascript object name. Let’s search for “ak:”.

_player.js_

player.script.js

Aha! It looks like ak is a concatenation of “_” and two strings in both player.js and player.script.js. We also see it in player.js passed to function k.md5().

My guess is that this concatenated string value is converted to an md5 hash value. But first we must figure out the values being concatenated to “_”.

Let’s open player.js in the Sources tab (right click inside Response body and choose Open in Sources panel) and add a breakpoint after ak is defined.

player.js

Next, let’s close our open file tabs under Sources and reload the page.

Odd. It doesn’t seem to hit that line. Let’s try the same thing in player.script.js.

player.script.js

(Make sure to click the Pretty Print brackets on the bottom left after reloading the page.)

player.script.js

Bingo! iappears to be the same value as the spaceid that we noted earlier, and r is “headlines.yahoo.co.jp,” our host name, so we now have the string value “2078710353_headlines.yahoo.co.jp”.

It looks nothing like the long, cryptic ak value in the request query, but remember that k.md5 function call in player.js? Let’s check its md5 hash value.

Would you look at that! It’s the same value as ak in the JSON request query. Nailed it.

Recap

Recall that the JSON request included the following parameters, minus the thumb values.

  • appid: dj0zaiZpPVZMTV…jcmV0Jng9YjU-

  • output: json

  • space_id: 2078710353

  • domain: headlines.yahoo.co.jp

  • ak: 40e90ec7a4ffb34260fcbb9497778731

  • device_type: 1100

We now have all the unique values — and, more importantly, their sources — that are required to make a JSON request programmatically in our code for any article. One last thing on the parameters: is device_typenecessary? Let’s make a request without it.

No video data. Apparently it is, so we’ll keep it.

Quick review on how we got to our video data.

  • Article URL (extracted host name)
  • Article HTML source code (extracted contentid and spaceid)
  • md5 hash generator (ran on spaceid + “_” + host to get value of pk)
  • player.script.js (extracted appid)
  • Request to https://feapi-yvpub.yahooapis.jp/v1/content/{contentid} with our contentid, appid, spaceid, and pk values.
Code

What might this extraction process look like in code? Here’s a rough example in Python:

import hashlib
import re
import requests


_VALID_URL = r'https?://(?P<host>(?:news|headlines)\.yahoo\.co\.jp)[^\d]*(?P<id>\d[\d-]*\d)?'

# More functions here...

def yahoojnews_extract(url):
    mobj = re.match(_VALID_URL, url)
    if not mobj:
        raise ValueError('Invalid url %s' % url)
    host = mobj.group('host')
    display_id = mobj.group('id') or host
    webpage = _download_webpage(url)

    title = _search_title(webpage)
    
    if display_id == host:
        # Headline page (w/ multiple BC playlists) ('news.yahoo.co.jp', 'headlines.yahoo.co.jp/videonews/', ...)
        return _playlist_result(webpage)

    # Article page
    description = _search_description(webpage)
    thumbnail = _search_thumbnail(webpage)

    space_id = _search_regex([
            r'<script[^>]+class=["\']yvpub-player["\'][^>]+spaceid=([^&"\']+)',
            r'YAHOO\.JP\.srch\.\w+link\.onLoad[^;]+spaceID["\' ]*:["\' ]+([^"\']+)',
            r'<!--\s+SpaceID=(\d+)'
        ], webpage, 'spaceid')
    content_id = re.search(
        r'<script[^>]+class=(["\'])yvpub-player\1[^>]+contentid=(?P<contentid>[^&"\']+)',
        webpage,
    ).group('contentid')

    r = requests.get(
        'https://feapi-yvpub.yahooapis.jp/v1/content/%s' % content_id,
        headers={
            'Accept': 'application/json, text/javascript, */*; q=0.01',
            'Origin': 'https://s.yimg.jp',
            'Host': 'feapi-yvpub.yahooapis.jp',
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36',
            'Referer': 'https://s.yimg.jp/images/yvpub/player/vamos/pc/latest/player.html',
        },
        params={
            'appid': 'gj0zaiZpPVZMTVFJR0F...VycbVjcmV0jng9Yju-',
            'output': 'json',
            'space_id': space_id,
            'domain': host,
            'ak': hashlib.md5('_'.join((space_id, host)).encode()).hexdigest(),
            'device_type': '1100',
        },
    )
    r.raise_for_status()
    json_data = r.json()

    formats = _parse_formats(json_data)

    return {
        'id': display_id,
        'title': title,
        'description': description,
        'thumbnail': thumbnail,
        'formats': formats,
    }

Example of Yahoo! Japan News article embedded video extraction.

Note in yahoojnews_extract() that I include header values in the request that match the actual values in the request made from our page to avoid suspicion. Once I have the JSON data, I pass it to _parse_formats to extract the video data (urls, fps, etc.) and return it along with other information such as the title.

View the extraction code I wrote for youtube-dl here or download and watch it in action:

$ pip install youtube-dl && youtube-dl https://news.yahoo.co.jp

Thanks for reading!

JavaScript vs Python: Will Python Replace JavaScript popularity by 2020?

JavaScript vs Python: Will Python Replace JavaScript popularity by 2020?

JavaScript is currently the most commonly used programming language but now Python is dishing out some stiff competition. Python has been steadily increasing in popularity so much so that it is now the fastest-growing programming language. So will Python Replace JavaScript popularity by 2020?

This is the Clash of the Titans!!

And no…I am not talking about the Hollywood movie (don’t bother watching it…it’s horrible!). I am talking about JavaScript and Python, two of the most popular programming languages in existence today.

JavaScript is currently the most commonly used programming language (and has been for quite some time!) but now Python is dishing out some stiff competition. Python has been steadily increasing in popularity so much so that it is now the fastest-growing programming language. So now the question is…Will Python Replace JavaScript popularity by 2020?

To understand the above question correctly, it is important to know more about JavaScript and Python as well as the reasons for their popularity. So let’s start with JavaScript first!

Why is JavaScript so popular?

JavaScript is a high-level, interpreted programming language that is most popular as a scripting language for Web pages. This means that if a web page is not just sitting there and displaying static information, then JavaScript is probably behind that. And that’s not all, there are even advanced versions of the language such as Node.js which is used for server-side scripting.

JavaScript is an extremely popular language. And if my word doesn’t convince you, here are the facts!!!

According to StackOverflow Developer Survey Results 2019, JavaScript is the most commonly used programming language, used by 69.7 % of professional developers. And this is a title it has claimed the past seven years in a row.

In addition to that, the most commonly used Web Frameworks are jQuery, Angular.js and React.js (All of which incidentally use JavaScript). Now if that doesn’t demonstrate JavaScript’s popularity, what does?!

Image Source: Stackoverflow

So now the question arises…Why is JavaScript so popular?

Well, some of the reasons for that are:

  • JavaScript is used both on the client-side and the server-side. This means that it runs practically everywhere from browsers to powerful servers. This gives it an edge over other languages that are not so versatile.
  • JavaScript implements multiple paradigms ranging from OOP to procedural. This allows developers the freedom to experiment as they want.
  • JavaScript has a large community of enthusiasts that actively back the language. Without this, it would have been tough for JavaScript to establish the number one position it has.
Can Python Replace JavaScript in Popularity?

Python is an interpreted, general-purpose programming language that has multiple uses ranging from web applications to data analysis. This means that Python can be seen in complex websites such as YouTube or Instagram, in cloud computing projects such as OpenStack, in Machine Learning, etc. (basically everywhere!)

Python has been steadily increasing in popularity so much so that it is the fastest-growing major programming language today according to StackOverflow Developer Survey Results 2019.

This is further demonstrated by this Google Trends chart showing the growth of Python as compared to JavaScript over the last 5 years:

As shown in the above data, Python recorded increased search interest as compared to JavaScript for the first time around November 2017 and it has maintained its lead ever since. This shows remarkable growth in Python as compared to 5 years ago.

In fact, Stack Overflow created a model to forecast its future traffic based on a model called STL and guess what…the prediction is that Python could potentially stay in the lead against JavaScript till 2020 at the least.

Image Source : Stackoverflow

All these trends indicate that Python is extremely popular and getting even more popular with time. Some of the reasons for this incredible performance of Python are given as follows:

  • Python is Easy To Use
    No one likes excessively complicated things and that’s one of the reasons for the growing popularity of Python. It is simple with an easily readable syntax and that makes it well loved by both seasoned developers and experimental students. In addition to this, Python is also supremely efficient. It allows developers to complete more work using fewer lines of code. With all these advantages, what’s not to love?!!
  • Python has a Supportive Community
    Python has been around since 1990 and that is ample time to create a supportive community. Because of this support, Python learners can easily improve their knowledge, which only leads to increasing popularity. And that’s not all! There are many resources available online to promote Python, ranging from official documentation to YouTube tutorials that are a big help for learners.
  • Python has multiple Libraries and Frameworks
    Python is already quite popular and consequently, it has hundreds of different libraries and frameworks that can be used by developers. These libraries and frameworks are really useful in saving time which in turn makes Python even more popular. Some of the popular libraries of Python are NumPy and SciPy for scientific computing, Django for web development, BeautifulSoup for XML and HTML parsing, scikit-learn for machine learning applications, nltk for natural language processing, etc.
So What’s the Conclusion?

While JavaScript is currently the most popular programming language, Python could soon outstrip it of this title based on its incredible growth rate. So it is entirely possible that Python could be the most popular programming language by 2020.

However, this will merely impact the relative popularity of these two languages and not specify which among them is the better language. That choice is entirely subjective and may depend on multiple factors such as project requirements, scalability, ease of learning as well as the future growth prospects.

JavaScript vs Python : Can Python outperform JavaScript in the next five years?

JavaScript vs Python : Can Python outperform JavaScript in the next five years?

JavaScript and Python are two influential programming languages for building a wide range of applications. While JavaScript has been the dominant programming language for many years, Python’s fast-growth threatens to dethrone the widely popular technology.

JavaScript and Python are two influential programming languages for building a wide range of applications. While JavaScript has been the dominant programming language for many years, Python’s fast-growth threatens to dethrone the widely popular technology.

This is the Clash of the Titans!!

And no…I am not talking about the Hollywood movie (don’t bother watching it…it’s horrible!). I am talking about JavaScript** and **Python, two of the most popular programming languages in existence today.

JavaScript is currently the most commonly used programming language (and has been for quite some time!) but now Python is dishing out some stiff competition. Python has been steadily increasing in popularity so much so that it is now the fastest-growing programming language. So now the question is…Will Python Replace JavaScript popularity by 2020?

To understand the above question correctly, it is important to know more about JavaScript and Python as well as the reasons for their popularity. So let’s start with JavaScript first!

Why is JavaScript so popular?

JavaScript is a high-level, interpreted programming language that is most popular as a scripting language for Web pages. This means that if a web page is not just sitting there and displaying static information, then JavaScript is probably behind that. And that’s not all, there are even advanced versions of the language such as Node.js which is used for server-side scripting.

JavaScript is an extremely popular language. And if my word doesn’t convince you, here are the facts!!!

According to StackOverflow Developer Survey Results 2019, JavaScript is the most commonly used programming language, used by 69.7 % of professional developers. And this is a title it has claimed the past seven years in a row.

In addition to that, the most commonly used Web Frameworks are jQuery, Angular.js and React.js (All of which incidentally use JavaScript). Now if that doesn’t demonstrate JavaScript’s popularity, what does?!

Image Source: Stackoverflow

So now the question arises…Why is JavaScript so popular?

Well, some of the reasons for that are:
JavaScript is used both on the client-side and the server-side. This means that it runs practically everywhere from browsers to powerful servers. This gives it an edge over other languages that are not so versatile.JavaScript implements multiple paradigms ranging from OOP to procedural. This allows developers the freedom to experiment as they want.JavaScript has a large community of enthusiasts that actively back the language. Without this, it would have been tough for JavaScript to establish the number one position it has.

Can Python Replace JavaScript in Popularity?

Python is an interpreted, general-purpose programming language that has multiple uses ranging from web applications to data analysis. This means that Python can be seen in complex websites such as YouTube or Instagram, in cloud computing projects such as OpenStack, in Machine Learning, etc. (basically everywhere!)

Python has been steadily increasing in popularity so much so that it is the fastest-growing major programming language today according to StackOverflow Developer Survey Results 2019.

This is further demonstrated by this Google Trends chart showing the growth of Python as compared to JavaScript over the last 5 years:

As shown in the above data, Python recorded increased search interest as compared to JavaScript for the first time around November 2017 and it has maintained its lead ever since. This shows remarkable growth in Python as compared to 5 years ago.

In fact, Stack Overflow created a model to forecast its future traffic based on a model called STL and guess what…the prediction is that Python could potentially stay in the lead against JavaScript till 2020 at the least.

Image Source : Stackoverflow

All these trends indicate that Python is extremely popular and getting even more popular with time. Some of the reasons for this incredible performance of Python are given as follows:
Python is Easy To UseNo one likes excessively complicated things and that’s one of the reasons for the growing popularity of Python. It is simple with an easily readable syntax and that makes it well loved by both seasoned developers and experimental students. In addition to this, Python is also supremely efficient. It allows developers to complete more work using fewer lines of code. With all these advantages, what’s not to love?!!Python has a Supportive CommunityPython has been around since 1990 and that is ample time to create a supportive community. Because of this support, Python learners can easily improve their knowledge, which only leads to increasing popularity. And that’s not all! There are many resources available online to promote Python, ranging from official documentation to YouTube tutorials that are a big help for learners.Python has multiple Libraries and FrameworksPython is already quite popular and consequently, it has hundreds of different libraries and frameworks that can be used by developers. These libraries and frameworks are really useful in saving time which in turn makes Python even more popular. Some of the popular libraries of Python are NumPy and SciPy for scientific computing, Django for web development, BeautifulSoup for XML and HTML parsing, scikit-learn for machine learning applications, nltk for natural language processing, etc.## So What’s the Conclusion?

While JavaScript is currently the most popular programming language, Python could soon outstrip it of this title based on its incredible growth rate. So it is entirely possible that Python could be the most popular programming language by 2020.

However, this will merely impact the relative popularity of these two languages and not specify which among them is the better language. That choice is entirely subjective and may depend on multiple factors such as project requirements, scalability, ease of learning as well as the future growth prospects.

Render HTML with Vanilla JavaScript and lit-html

Render HTML with Vanilla JavaScript and lit-html

Sometimes you need to render HTML elements on a web page. And like Goldilocks' search for "just right", you have to try a few techniques before you find the right one. Using a framework may be too hard. Using pure HTML and the DOM API may be too soft. What you need is something in the middle that is just right. Is lit-html "just right"? Let's find out.

Sometimes you need to render HTML elements on a web page. And like Goldilocks' search for "just right", you have to try a few techniques before you find the right one. Using a framework may be too hard. Using pure HTML and the DOM API may be too soft. What you need is something in the middle that is just right. Is lit-html "just right"? Let's find out.

First, I'll show how this all works. Then at the end of this article, I'll explain everything you need to get started with lit-html to try this for yourself.

When you're done, you can push your HTML app with lit-html to the cloud to see it in all of its glory! I included a link to a free Azure trial, so you can try it yourself.

Resources:

The Sample App

Here is the app I'll demonstrate in this article. It fetches a list of heroes and renders them when you click the button. It also renders a progress indicator while it is fetching.

What's the Value of lit-html

When you focus on rendering content, and nothing else, lit-html is a good fit. It works closely with the DOM to render content, and refresh it in an optimal manner. The docs can provide you with more details, but the basic code for lit-html looks like this.

// Credit: As seen in official docs https://lit-html.polymer-project.org/guide/getting-started

// Import lit-html
import { html, render } from 'lit-html';

// Define a template
const myTemplate = name =>
  html`
    <p>Hello ${name}</p>
  `;

// Render the template to the document
render(myTemplate('World'), document.body);

You import lit-html, define a template, then render it to the DOM. That's it!

Rendering HTML

A progress bar is fairly basic. There is some HTML, and we show it when needed and hide it when it is not required. While we could use a template, or innerHTML, or the DOM API for this, let's see what this would look like with lit-html.

First, we get a reference to the element in the DOM where the progress bar will appear.

Then we define the template. This code looks and feels like JSX (or TSX). The advantage here is that you can write the HTML. You wrap the HTML in a template string (notice the back-tick character is used and not a single quote). Template strings allow you to span lines and insert variables where needed (we'll see this soon). The magic that makes this work is the html tag that precedes the template string. The html tag is what tells lit-html that you are about to define a template.

Next, we compile the template and pass those results to lit-html's render function, which places the results in the DOM. Finally, we hide or show the progress bar as needed.

function showProgress(show = true) {
  const container = document.getElementById('progress-placeholder');

  const template: () => TemplateResult = () => html`
    <progress class="progress is-medium is-info" max="100"></progress>
  `;
  const result = template();
  render(result, container);

  container.style.display = show ? 'block' : 'none';
}

Now you can run this showProgress function any time you want to show the progress bar.

Note that when a template is re-rendered, the only part that is updated is the data that changed. If no data changed, nothing is updated.

Rendering HTML with Dynamic Values

The progress bar does not change each time it is rendered. You will have situations where you want your HTML to change. For example, you may have a message area on your web app that shows a styled message box with a title and a message. The title and message will change every time you show the message area. Now you have dynamic values.

The HTML is defined with a template string, so it is trivial to add a variable into it. Notice the code below adds a title and text into the template, using the ${data.title} and ${data.text} syntax, respectively.

Then the template is compiled and rendered were needed.

When this template is re-rendered, the only part that is updated is the data that changed. In this case, that's the title and text.

function showMessage(text: string, title = 'Info') {
  const template: (data: any) => TemplateResult = (data: Message) => html`
    <div id="message-box" class="message is-info">
      <h3 class="message-header">${data.title}</h3>
      <p class="message-body">${data.text}</p>
    </div>
  `;

  const el = document.getElementById('message-placeholder');
  const result = template({ title, text });
  render(result, el);

  el.style.visibility = !!text ? 'visible' : 'hidden';
}

Rendering a List

Things get a little more real when we render a list. Let's think about that for a moment. A list requires that we have a plan if there is data and a backup plan if there is no data. A list requires that we render the same thing for each row, and we don't know how many rows we have. A list requires that we pass different values for each row, too. Then we have to take the rows and wrap them in a container such as a <ul> or a <table>.

So there is a little more logic here, regardless of whether we use lit-html or any other technique. Let's explore how the replaceHeroList function renders the rows using lit-html.

function replaceHeroList(heroes?: Hero[]) {
 const heroPlaceholder = document.querySelector('.hero-list');

 // Define the template
 let template: () => TemplateResult;

 if (heroes && heroes.length) {
   // Create the template for every hero row
   template = createList();
 } else {
   // Create the template with a simple "not found" message
   template = () =>
     html`
       <p>heroes not found</p>
     `;
 }

 // Compile the template
 const result = template();

 // Render the template
 render(result, heroPlaceholder);

Notice that when there are heroes, we call the createList function. This function begins by creating an array of TemplateResult. So for every hero in the heroes array, we define a template that represents the <li> containing the HTML that displays that respective hero.

Then we create another template that contains the <ul> and embeds the array of hero templates. It's pretty cool that we can embed templates like this! Finally, we return it all and let the logic compile the templates and render them.

function createList() {
  // Create an array of the templates for each hero
  const templates: TemplateResult[] = heroes.map(hero => {
    return html`
      <li>
        <div class="card">
          <div class="card-content">
            <div class="content">
              <div class="name">${hero.name}</div>
              <div class="description">${hero.description}</div>
            </div>
          </div>
        </div>
      </li>
    `;
  });

  // Create a template that includes the hero templates
  const ulTemplate: () => TemplateResult = () =>
    html`
      <ul>
        ${templates}
      </ul>
    `;
  return ulTemplate;
}

Summary

When you want to render HTML, lit-html is a fast and light-weight option. Is it better than using templates and the DOM API? You'll have to decide what is best for you. But the real story here is that you have another great option to consider when determining the right tool for your job.

Prologue

You can also get editor help with your lit-html templates. Notice the image below shows the syntax highlighting for the HTML template!

Setup

You can install the lit-html package with npm.

npm install lit-html

Alternately you can load it directly from the unpkg.com CDN

import { html, render } from 'https://unpkg.com/lit-html?module';

You have a choice here. npm is my preference, but feel 100% free to use the CDN if that suits you.

TypeScript and lit-html

You only need to include the library for lit-html and you're done. But I like to use TypeScript, and I absolutely recommend enabling your tooling to work great with typeScript and lit-html.

Let me be very clear here - you do not need TypeScript. I choose to use it because it helps identify mistakes while I write code. If you don't want TypeScript, you can opt to use plain JavaScript.

Here are the steps to make TypeScript and lit-html light up together:

  1. Install TypeScript support for lit-html
  2. Configure your tsconfig.json file
  3. Install the VS Code extension for lit-html

Run this command to install the plugin and typescript, as development dependencies to your project.

npm install --save-dev typescript-lit-html-plugin typescript

Edit your tsconfig.json by adding the following to your compilerOptions section.

"compilerOptions": {
  "plugins": [
    {
      "name": "typescript-lit-html-plugin"
    }
  ]
}

Finally, install the VS Code extension for lit-html.

Now you get syntax highlighting for all of your lit-html templates!