Convert QUrl with percent encoding into string

Silicomancer picture Silicomancer · Jun 21, 2014 · Viewed 13.2k times · Source

I use a URL entered by the user as text to initialize a QUrl object. Later I want to convert the QUrl back into a string for displaying it and to check it using regular expression. This works fine as long as the user does not enter any percent encoded URLs.

Why doesn't the following example code work?

qDebug() << QUrl("http://test.com/query?q=%2B%2Be%3Axyz%2Fen").toDisplayString(QUrl::FullyDecoded); 

It simply doesn't decode any of the percent-encoded characters. It should print "http://test.com/query?q=++e:xyz/en" but it actually prints "http://test.com/query?q=%2B%2Be%3Axyz%2Fen".

I also tried a lot of other methods like fromUserInput() but I could not make the code work correctly in Qt5.3.

Can someone explain me how to do this and why the above code doesn't work (i.e. showing the decoded URL) even when using QUrl::FullyDecoded?

UPDATE

After getting the fromPercentEncoding() hint, I tried the following code:

QUrl UrlFromUserInput(const QString& input)
{
   QByteArray latin = input.toLatin1();
   QByteArray utf8 = input.toUtf8();
   if (latin != utf8)
   {
      // URL string containing unicode characters (no percent encoding expected)
      return QUrl::fromUserInput(input);
   }
   else
   {
      // URL string containing ASCII characters only (assume possible %-encoding)
      return QUrl::fromUserInput(QUrl::fromPercentEncoding(input.toLatin1()));
   }
}

This allows the user to input unicode URLs and percent-encoded URLs and it is possible to decode both kinds of URLs for displaying/matching. However the percent-encoded URLs did not work in QWebView... the web-server responded differently (it returned a different page). So obviously QUrl::fromPercentEncoding() is not a clean solution since it effectively changes the URL. I could create two QUrl objects in the above function... one constructed directly, one constructed using fromPercentEncoding(), using the first for QWebView and the latter for displaying/matching only... but this seems absurd.

Answer

Tay2510 picture Tay2510 · Jun 24, 2014

Conclusion

I've done some research, the conclusion so far is: absurd.

QUrl::fromPercentEncoding() is the way to go and what OP has done in the UPDATE section should've been the accepted answer to the question in title.

I think Qt's document of QUrl::toDisplayString is a little bit misleading :

"Returns a human-displayable string representation of the URL. The output can be customized by passing flags with options. The option RemovePassword is always enabled, since passwords should never be shown back to users."

Actually it doesn't claim any decoding ability, the document here is unclear about it's behavior. But at least the password part is true. I've found some clues on Gitorious:

"Add QUrl::toDisplayString(), which is toString() without password. And fix documentation of toString() which said this was the method to use for displaying to humans, while this has never been true."


Test Code

In order to discern the decoding ability of different functions. The following code has been tested on Qt 5.2.1 (not tested on Qt 5.3 yet!)

QString target(/*path*/);

QUrl url_path(target);
qDebug() << "[Original String]:" << target;
qDebug() << "--------------------------------------------------------------------";
qDebug() << "(QUrl::toEncoded)          :" << url_path.toEncoded(QUrl::FullyEncoded);
qDebug() << "(QUrl::url)                :" << url_path.url();
qDebug() << "(QUrl::toString)           :" << url_path.toString(); 
qDebug() << "(QUrl::toDisplayString)    :" << url_path.toDisplayString(QUrl::FullyDecoded);
qDebug() << "(QUrl::fromPercentEncoding):" << url_path.fromPercentEncoding(target.toUtf8());

P.S. QUrl::url is just synonym for QUrl::toString.


Output

[Case 1]: When target path = "%_%" (test the functionality of encoding):

[Original String]: "%_%" 
-------------------------------------------------------------------- 
(QUrl::toEncoded)          : "%25_%25" 
(QUrl::url)                : "%25_%25" 
(QUrl::toString)           : "%25_%25" 
(QUrl::toDisplayString)    : "%25_%25" 
(QUrl::fromPercentEncoding): "%_%" 

[Case 2]: When target path = "Meow !" (test the functionality of encoding):

[Original String]: "Meow !" 
-------------------------------------------------------------------- 
(QUrl::toEncoded)          : "Meow%20!" 
(QUrl::url)                : "Meow !" 
(QUrl::toString)           : "Meow !" 
(QUrl::toDisplayString)    : "Meow%20!" // "Meow !" when using QUrl::PrettyDecoded mode
(QUrl::fromPercentEncoding): "Meow !" 

[Case 3]: When target path = "Meow|!" (test the functionality of encoding):

[Original String]: "Meow|!" 
-------------------------------------------------------------------- 
(QUrl::toEncoded)          : "Meow%7C!" 
(QUrl::url)                : "Meow%7C!" 
(QUrl::toString)           : "Meow%7C!" 
(QUrl::toDisplayString)    : "Meow|!" // "Meow%7C!" when using QUrl::PrettyDecoded mode
(QUrl::fromPercentEncoding): "Meow|!" 

[Case 4]: When target path = "http://test.com/query?q=++e:xyz/en" (none % encoded):

[Original String]: "http://test.com/query?q=++e:xyz/en" 
-------------------------------------------------------------------- 
(QUrl::toEncoded)          : "http://test.com/query?q=++e:xyz/en" 
(QUrl::url)                : "http://test.com/query?q=++e:xyz/en" 
(QUrl::toString)           : "http://test.com/query?q=++e:xyz/en" 
(QUrl::toDisplayString)    : "http://test.com/query?q=++e:xyz/en" 
(QUrl::fromPercentEncoding): "http://test.com/query?q=++e:xyz/en" 

[Case 5]: When target path = "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" (% encoded):

[Original String]: "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" 
-------------------------------------------------------------------- 
(QUrl::toEncoded)          : "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" 
(QUrl::url)                : "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" 
(QUrl::toString)           : "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" 
(QUrl::toDisplayString)    : "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" 
(QUrl::fromPercentEncoding): "http://test.com/query?q=++e:xyz/en" 

P.S. I also encounter the bug that Ilya mentioned in comments: Percent Encoding doesn't seem to be working for '+' in QUrl


Summary

The result of QUrl::toDisplayString is ambiguous. As the document says, the QUrl::FullyDecoded mode must be used with care. No matter what type of URL you got, encode them by QUrl::toEncode and display them with QUrl::fromPercentEncoding when necessary.

As for the malfunction of percent-encoded URLs in QWebView mentioned in OP, more details are needed to debug it. Different function and different mode used could be the reason.


Helpful Resources

  1. RFC 3986 (which QUrl conforms)
  2. Encode table
  3. Source of qurl.cpp on Gitorious