Invalid character exception when adding Metadata to a CloudBlob

bPratik picture bPratik · Feb 15, 2013 · Viewed 8.5k times · Source

Task

Upload a file to Azure Blob Storage with the original filename and also assign the filename as meta-data to the CloudBlob

Problem

These characters are not permitted in the meta-data but are acceptable as the blob name:

š Š ñ Ñ ç Ç ÿ Ÿ ž Ž Ð œ Œ « » éèëêð ÉÈËÊ àâä ÀÁÂÃÄÅ àáâãäå ÙÚÛÜ ùúûüµ òóôõöø ÒÓÔÕÖØ ìíîï ÌÍÎÏ

Question

  • Is there a way to store these characters in the meta-data? Are we missing some setting that causes this exception?
  • Most of these characters are standard glyphs in some languages, so how to handle that?
  • Is there any documentation available that advises about this issue? I found blob and meta-data naming conventions, but none about the data itself!

Code

var dirtyFileName      = file.FileName;
var normalizedFileName = file.FileName.CleanOffDiacriticAndNonASCII();

// Blob name accepts almost characters that are acceptable as filenames in Windows
var blob = container.GetBlobReference(dirtyFileName);

//Upload content to the blob, which will create the blob if it does not already exist.
blob.Metadata["FileName"] = normalizedFileName;
blob.Attributes.Properties.ContentType = file.ContentType;

// ERROR: Occurs here!
blob.UploadFromStream(file.InputStream);

blob.SetMetadata();
blob.SetProperties();

Error

Exception

References


Workarounds

Illegal characters in filename is only the tip of the ice-berg, magnified only for the purpose of this question! The bigger picture is that we index these files using Lucene.net and as such need a lot of meta-data to be stored on the blob. Please don't suggest storing it all separately in a database, just don't! Up until now we have been lucky to only have come across one file with diacritic characters!

So, at the moment we are making the effort to avoid saving the filename in the meta-data as a workaround!

Answer

bPratik picture bPratik · Feb 20, 2013

Just have had confirmation from the azure-sdk-for-net team on GitHub that only ASCII characters are valid as data within blob meta-data.

joeg commented:
The supported characters in the blob metadata must be ASCII characters. To work around this you can either escape the string ( percent encode), base64 encode etc.

Source on GitHub

So as a work-around, either:

  • escape the string (percent encode), base64 encode, etc, as suggested by joeg
  • use the techniques that I have mentioned in my other answer.