URL Sanitization — The Why and How

Starting with some basics let’s elaborate what URL actually stands for and how can one become an expert in reading URLs so you are always educated about what you’re clicking.

What is a URL?

URL stands for Uniform Resource Locator. A URL is nothing more than the address of a given unique resource on the Web. In theory, each valid URL points to a unique resource. Such resources can be an HTML page, a CSS document, an image, etc. In practice, there are some exceptions, the most common being a URL pointing to a resource that no longer exists or that has moved.

Now, what is URL sanitization?

URL sanitization means exactly what you think it means. URL clean up. But why would a URL need cleaning up? Doesn’t it mean that we won’t arrive to the intended website if we cut some parts of the URL? Let me explain.

URL Anatomy & Tracking Codes

https://www.amazon.com/gp/help/customer/display.html?nodeId=508510&ref_=nav_cs_customerservice

So, there is a link above which consists of several parts. Let’s chunk the URL down to understand what all those parts mean.

https://    (Internet Protocol)
www         (Sub-domain)
amazon.com  (Domain/Top Level Domain)
/gp/help/customer/display.html   (File path)
?           (Separator delimiting file path from query parameters)
nodeId=508510   (Tracking code)
&           (Query delimiter/separator for key value pairs)
ref=nav_cs_customerservice  (Tracking code)

Everything up to and including the file path is the ‘core’ of the URL. The reason for this discussion is everything that follows after the question mark (?).

#google-analytics #cybersecurity #social-engineering #url #data analytic

What is a URL?

Now, what is URL sanitization?

URL Anatomy & Tracking Codes

medium.com

URL Sanitization — The Why and How