Hi I am trying to extract the rootdomain from URL string in Google Sheets. I know how to get the domain and I have the formula to remove www.
but now I realize it does not strip subdomain prefixes like 'mysite'.site.com; where mysite
is not stripped from the domain name.
Question: How can I retrieve the domain.com
rootdomain where the domain string contacts alphanumeric characters, then 1 dot, then alphanumeric characters (and nothing more)
Formula so far in Google Sheets:
=REGEXREPLACE(REGEXREPLACE(D3923;"(http(s)?://)?(www\.)?";"");"/.*";"")
Maybe this can be simplified ...
Test cases
https://www.domain.com/ => domain.com
https://domain.com/ => domain.com
http://www.domain.nl/ => domain.com
http://domain.de/ => domain.com
http://www.domain.co.uk/ => domain.co.uk
http://domain.co.au/ => domain.co.au
sub.domain.org/ => sub.domain.com
sub.domain.org => sub.domain.com
domain.com => domain.com
http://www.domain.nl?par=1 => domain.com
https://www.domain.nl/test/?par=1 => domain.com
http2://sub2.startpagina.nl/test/?par=1 => domain.com
Currently using:
=trim(REGEXEXTRACT(REGEXREPLACE(REGEXREPLACE(A2;"https?://";"");"^(w{3}\.)?";"")&"/";"([^/?]+)"))
Seems to work fine
Updated:7-7-2016
(thanks for all the help!)