Regular expression to remove hostname and port from URL?

javascript

To remove the hostname and port from a URL using regular expressions in JavaScript, you can use the replace() method along with the following regular expression pattern:

const url = "https://example.com:8080/path/to/file.html";
const cleanUrl = url.replace(/^(https?:\/\/)?([\w.-]+)(:\d+)?(.*)/, "$4");
console.log(cleanUrl); // Output: /path/to/file.html

In this example, we start with a URL string that includes the protocol (http or https), hostname (example.com), port number (8080), and a path to a file (/path/to/file.html).

The regular expression pattern /^(https?://)?([\w.-]+)(:\d+)?(.*)/ matches the entire URL and captures the path to the file in the fourth capturing group ($4).

Here’s a breakdown of the regular expression pattern:

  • ^ matches the beginning of the string
  • (https?://)? optionally matches the protocol (http or https) followed by ://
  • ([\w.-]+) matches the hostname, which consists of one or more word characters (\w), dots (.), and hyphens (-) (Note: this pattern may not match all possible valid hostnames, but should be sufficient for most cases)
  • (:\d+)? optionally matches a colon followed by one or more digits, which represents the port number
  • (.*) matches the rest of the URL, including the path to the file

The second argument to replace() is the replacement string, which in this case is just the fourth capturing group ($4), representing the path to the file without the hostname and port.