While I can successfully encode and decode the user data part of an SMS message when a UDH is not present, I'm having trouble doing so when a UDH is present (in this case, for concatenated SMS).
When I decode or encode the user data, do I need to prepend the UDH to the text before doing so?
This article provides an encoding routine sample that compensates for the UDH with padding bits (which I still don't completely understand) but it doesn't give an example of data being passed to the routine so I don't have a clear use case (and I could not find a decoding sample on the site): http://mobiletidings.com/2009/07/06/how-to-pack-gsm7-into-septets/.
So far, I have been able to get some results if I prepend the UDH to the user data before decoding it, but I suspect this is just a coincidence.
As an example (using values from https://en.wikipedia.org/wiki/Concatenated_SMS):
UDH := '050003000302';
ENCODED_USER_DATA_PART := 'D06536FB0DBABFE56C32'; // with padding, evidently
DecodedUserData := Decode7Bit(UDH + ENCODED_USER_DATA_PART);
Writeln(DecodedUserData);
Output: "ß@ø¿Æ @hello world"
EncodedUserData := Encode7Bit(DecodedUserData);
DecodedUserData := Decode7Bit(EncodedEncodedUserData);
Writeln(DecodedUserData);
Same Output: "ß@ø¿Æ @hello world"
Without prepending the UDH I get garbage:
DecodedUserData := Decode7Bit(ENCODED_USER_DATA_PART);
Writeln(DecodedUserData);
Output: "PKYY§An§eYI"
What is correct way of handling this?
Am I supposed to include the UDH with the text when encoding the user data?
Am I supposed to strip off the garbage characters after decoding, or am I (as I suspect) completely off base with this assumption?
While the decoding algorithm here seems to work without a UDH it doesn't seem to take any UDH information into account: Looking for GSM 7bit encode/decode algorithm.
I would be eternally grateful if someone could set me straight on the correct way to proceed. Any clear examples/code samples would be very much appreciated. ;-)
I will also provide a small sample application that includes the algorithms if anyone feels it will help solve the riddle.
EDIT 1:
I'm using Delphi XE2 Update 4 Hotfix 1
EDIT 2:
Thanks to help from @whosrdaddy, I was able to successfully get my encoding/decoding routines to work.
As a side note, I was curious as to why the user data needed to be on a 7-bit boundary when the UDH wasn't encoded with it, but the last sentence in the paragraph from the ETSI specification quoted by @whosrdaddy answered that:
If 7 bit data is used and the TP-UD-Header does not finish on a septet boundary then fill bits are inserted after the last Information Element Data octet so that there is an integral number of septets for the entire TP-UD header. This is to ensure that the SM itself starts on an octet boundary so that an earlier phase mobile will be capable of displaying the SM itself although the TP-UD Header in the TP-UD field may not be understood
My code is based in part on examples from the following resources:
Looking for GSM 7bit encode/decode algorithm
https://en.wikipedia.org/wiki/Concatenated_SMS
http://mobiletidings.com/2009/02/18/combining-sms-messages/
http://mobiletidings.com/2009/07/06/how-to-pack-gsm7-into-septets/
http://mobileforensics.files.wordpress.com/2007/06/understanding_sms.pdf
http://www.dreamfabric.com/sms/
http://www.mediaburst.co.uk/blog/concatenated-sms/
Here's the code for anyone else who's had trouble with SMS encoding/decoding. I'm sure it can be simplified/optimized (and comments are welcome), but I've tested it with several different permutations and UDH header lengths with success. I hope it helps.
unit SmsUtils;
interface
uses Windows, Classes, Math;
function Encode7Bit(const AText: string; AUdhLen: Byte;
out ATextLen: Byte): string;
function Decode7Bit(const APduData: string; AUdhLen: Integer): string;
implementation
var
g7BitToAsciiTable: array [0 .. 127] of Byte;
gAsciiTo7BitTable: array [0 .. 255] of Byte;
procedure InitializeTables;
var
AsciiValue: Integer;
i: Integer;
begin
// create 7-bit to ascii table
g7BitToAsciiTable[0] := 64; // @
g7BitToAsciiTable[1] := 163;
g7BitToAsciiTable[2] := 36;
g7BitToAsciiTable[3] := 165;
g7BitToAsciiTable[4] := 232;
g7BitToAsciiTable[5] := 223;
g7BitToAsciiTable[6] := 249;
g7BitToAsciiTable[7] := 236;
g7BitToAsciiTable[8] := 242;
g7BitToAsciiTable[9] := 199;
g7BitToAsciiTable[10] := 10;
g7BitToAsciiTable[11] := 216;
g7BitToAsciiTable[12] := 248;
g7BitToAsciiTable[13] := 13;
g7BitToAsciiTable[14] := 197;
g7BitToAsciiTable[15] := 229;
g7BitToAsciiTable[16] := 0;
g7BitToAsciiTable[17] := 95;
g7BitToAsciiTable[18] := 0;
g7BitToAsciiTable[19] := 0;
g7BitToAsciiTable[20] := 0;
g7BitToAsciiTable[21] := 0;
g7BitToAsciiTable[22] := 0;
g7BitToAsciiTable[23] := 0;
g7BitToAsciiTable[24] := 0;
g7BitToAsciiTable[25] := 0;
g7BitToAsciiTable[26] := 0;
g7BitToAsciiTable[27] := 0;
g7BitToAsciiTable[28] := 198;
g7BitToAsciiTable[29] := 230;
g7BitToAsciiTable[30] := 223;
g7BitToAsciiTable[31] := 201;
g7BitToAsciiTable[32] := 32;
g7BitToAsciiTable[33] := 33;
g7BitToAsciiTable[34] := 34;
g7BitToAsciiTable[35] := 35;
g7BitToAsciiTable[36] := 164;
g7BitToAsciiTable[37] := 37;
g7BitToAsciiTable[38] := 38;
g7BitToAsciiTable[39] := 39;
g7BitToAsciiTable[40] := 40;
g7BitToAsciiTable[41] := 41;
g7BitToAsciiTable[42] := 42;
g7BitToAsciiTable[43] := 43;
g7BitToAsciiTable[44] := 44;
g7BitToAsciiTable[45] := 45;
g7BitToAsciiTable[46] := 46;
g7BitToAsciiTable[47] := 47;
g7BitToAsciiTable[48] := 48;
g7BitToAsciiTable[49] := 49;
g7BitToAsciiTable[50] := 50;
g7BitToAsciiTable[51] := 51;
g7BitToAsciiTable[52] := 52;
g7BitToAsciiTable[53] := 53;
g7BitToAsciiTable[54] := 54;
g7BitToAsciiTable[55] := 55;
g7BitToAsciiTable[56] := 56;
g7BitToAsciiTable[57] := 57;
g7BitToAsciiTable[58] := 58;
g7BitToAsciiTable[59] := 59;
g7BitToAsciiTable[60] := 60;
g7BitToAsciiTable[61] := 61;
g7BitToAsciiTable[62] := 62;
g7BitToAsciiTable[63] := 63;
g7BitToAsciiTable[64] := 161;
g7BitToAsciiTable[65] := 65;
g7BitToAsciiTable[66] := 66;
g7BitToAsciiTable[67] := 67;
g7BitToAsciiTable[68] := 68;
g7BitToAsciiTable[69] := 69;
g7BitToAsciiTable[70] := 70;
g7BitToAsciiTable[71] := 71;
g7BitToAsciiTable[72] := 72;
g7BitToAsciiTable[73] := 73;
g7BitToAsciiTable[74] := 74;
g7BitToAsciiTable[75] := 75;
g7BitToAsciiTable[76] := 76;
g7BitToAsciiTable[77] := 77;
g7BitToAsciiTable[78] := 78;
g7BitToAsciiTable[79] := 79;
g7BitToAsciiTable[80] := 80;
g7BitToAsciiTable[81] := 81;
g7BitToAsciiTable[82] := 82;
g7BitToAsciiTable[83] := 83;
g7BitToAsciiTable[84] := 84;
g7BitToAsciiTable[85] := 85;
g7BitToAsciiTable[86] := 86;
g7BitToAsciiTable[87] := 87;
g7BitToAsciiTable[88] := 88;
g7BitToAsciiTable[89] := 89;
g7BitToAsciiTable[90] := 90;
g7BitToAsciiTable[91] := 196;
g7BitToAsciiTable[92] := 204;
g7BitToAsciiTable[93] := 209;
g7BitToAsciiTable[94] := 220;
g7BitToAsciiTable[95] := 167;
g7BitToAsciiTable[96] := 191;
g7BitToAsciiTable[97] := 97;
g7BitToAsciiTable[98] := 98;
g7BitToAsciiTable[99] := 99;
g7BitToAsciiTable[100] := 100;
g7BitToAsciiTable[101] := 101;
g7BitToAsciiTable[102] := 102;
g7BitToAsciiTable[103] := 103;
g7BitToAsciiTable[104] := 104;
g7BitToAsciiTable[105] := 105;
g7BitToAsciiTable[106] := 106;
g7BitToAsciiTable[107] := 107;
g7BitToAsciiTable[108] := 108;
g7BitToAsciiTable[109] := 109;
g7BitToAsciiTable[110] := 110;
g7BitToAsciiTable[111] := 111;
g7BitToAsciiTable[112] := 112;
g7BitToAsciiTable[113] := 113;
g7BitToAsciiTable[114] := 114;
g7BitToAsciiTable[115] := 115;
g7BitToAsciiTable[116] := 116;
g7BitToAsciiTable[117] := 117;
g7BitToAsciiTable[118] := 118;
g7BitToAsciiTable[119] := 119;
g7BitToAsciiTable[120] := 120;
g7BitToAsciiTable[121] := 121;
g7BitToAsciiTable[122] := 122;
g7BitToAsciiTable[123] := 228;
g7BitToAsciiTable[124] := 246;
g7BitToAsciiTable[125] := 241;
g7BitToAsciiTable[126] := 252;
g7BitToAsciiTable[127] := 224;
// create ascii to 7-bit table
ZeroMemory(@gAsciiTo7BitTable, SizeOf(gAsciiTo7BitTable));
for i := 0 to High(g7BitToAsciiTable) do
begin
AsciiValue := g7BitToAsciiTable[i];
gAsciiTo7BitTable[AsciiValue] := i;
end;
end;
function ConvertAsciiTo7Bit(const AText: string; AUdhLen: Byte): AnsiString;
const
ESC = #27;
ESCAPED_ASCII_CODES = [#94, #123, #125, #92, #91, #126, #93, #124, #164];
var
Septet: Byte;
Ch: AnsiChar;
i: Integer;
begin
for i := 1 to Length(AText) do
begin
Ch := AnsiChar(AText[i]);
if not(Ch in ESCAPED_ASCII_CODES) then
Septet := gAsciiTo7BitTable[Byte(Ch)]
else
begin
Result := Result + ESC;
case (Ch) of
#12: Septet := 10;
#94: Septet := 20;
#123: Septet := 40;
#125: Septet := 41;
#92: Septet := 47;
#91: Septet := 60;
#126: Septet := 61;
#93: Septet := 62;
#124: Septet := 64;
#164: Septet := 101;
else Septet := 0;
end;
end;
Result := Result + AnsiChar(Septet);
end;
end;
function Convert7BitToAscii(const AText: AnsiString): string;
const
ESC = #27;
var
TextLen: Integer;
Ch: Char;
i: Integer;
begin
Result := '';
TextLen := Length(AText);
i := 1;
while (i <= TextLen) do
begin
Ch := Char(AText[i]);
if (Ch <> ESC) then
Result := Result + Char(g7BitToAsciiTable[Ord(Ch)])
else
begin
Inc(i); // skip ESC
if (i <= TextLen) then
begin
Ch := Char(AText[i]);
case (Ch) of
#10: Ch := #12;
#20: Ch := #94;
#40: Ch := #123;
#41: Ch := #125;
#47: Ch := #92;
#60: Ch := #91;
#61: Ch := #126;
#62: Ch := #93;
#64: Ch := #124;
#101: Ch := #164;
end;
Result := Result + Ch;
end;
end;
Inc(i);
end;
end;
function StrToHex(const AText: AnsiString): AnsiString; overload;
var
TextLen: Integer;
begin
// set the text buffer size
TextLen := Length(AText);
// set the length of the result to double the string length
SetLength(Result, TextLen * 2);
// convert the string to hex
BinToHex(PAnsiChar(AText), PAnsiChar(Result), TextLen);
end;
function StrToHex(const AText: string): string; overload;
begin
Result := string(StrToHex(AnsiString(AText)));
end;
function HexToStr(const AText: AnsiString): AnsiString; overload;
var
ResultLen: Integer;
begin
// set the length of the result to half the Text length
ResultLen := Length(AText) div 2;
SetLength(Result, ResultLen);
// convert the hex back into a string
if (HexToBin(PAnsiChar(AText), PAnsiChar(Result), ResultLen) <> ResultLen) then
Result := 'Error Converting Hex To String: ' + AText;
end;
function HexToStr(const AText: string): string; overload;
begin
Result := string(HexToStr(AnsiString(AText)));
end;
function Encode7Bit(const AText: string; AUdhLen: Byte;
out ATextLen: Byte): string;
// AText: Ascii text
// AUdhLen: Length of UDH including UDH Len byte (e.g. '050003CC0101' = 6 bytes)
// ATextLen: returns length of text that was encoded. This can be different
// than Length(AText) due to escape characters
// Returns text as encoded PDU hex string
var
Text7Bit: AnsiString;
Pdu: AnsiString;
PduIdx: Integer;
PduLen: Byte;
PaddingBits: Byte;
BitsToMove: Byte;
Septet: Byte;
Octet: Byte;
PrevOctet: Byte;
ShiftedOctet: Byte;
i: Integer;
begin
Result := '';
Text7Bit := ConvertAsciiTo7Bit(AText, AUdhLen);
ATextLen := Length(Text7Bit);
BitsToMove := 0;
// determine how many padding bits needed based on the UDH
if (AUdhLen > 0) then
PaddingBits := 7 - ((AUdhLen * 8) mod 7)
else
PaddingBits := 0;
// calculate the number of bytes needed to store the 7-bit text
// along with any padding bits that are required
PduLen := Ceil(((ATextLen * 7) + PaddingBits) / 8);
// reserve space for the PDU bytes
Pdu := AnsiString(StringOfChar(#0, PduLen));
PduIdx := 1;
for i := 1 to ATextLen do
begin
if (BitsToMove = 7) then
BitsToMove := 0
else
begin
// convert the current character to a septet (7-bits) and make room for
// the bits from the next one
Septet := (Byte(Text7Bit[i]) shr BitsToMove);
if (i = ATextLen) then
Octet := Septet
else
begin
// convert the next character to a septet and copy the bits from it
// to the octet (PDU byte)
Octet := Septet or
Byte((Byte(Text7Bit[i + 1]) shl Byte(7 - BitsToMove)));
end;
Byte(Pdu[PduIdx]) := Octet;
Inc(PduIdx);
Inc(BitsToMove);
end;
end;
// The following code pads the pdu on the *right* by shifting it to the *left*
// by <PaddingBits>. It does this by using the same bit storage convention as
// the 7-bit compression routine above, by taking the most significant
// <PaddingBits> from each PDU byte and moving them to the least significant
// bits of the next PDU byte. If there is no room in the last PDU byte for the
// high bits of the previous byte that were removed, then those bits are
// placed into an additional byte reserved for this purpose.
// Note: <PduLen> has already been set to account for the reserved byte if
// it is required.
if (PaddingBits > 0) then
begin
SetLength(Result, (PduLen * 2));
PrevOctet := 0;
for PduIdx := 1 to PduLen do
begin
Octet := Byte(Pdu[PduIdx]);
if (PduIdx = 1) then
ShiftedOctet := Byte(Octet shl PaddingBits)
else
ShiftedOctet := Byte(Octet shl PaddingBits) or
Byte(PrevOctet shr (8 - PaddingBits));
Byte(Pdu[PduIdx]) := ShiftedOctet;
PrevOctet := Octet;
end;
end;
Result := string(StrToHex(Pdu));
end;
function Decode7Bit(const APduData: string; AUdhLen: Integer): string;
// APduData: Hex string representation of PDU data
// AUdhLen: Length of UDH including UDH Len (e.g. '050003CC0101' = 6 bytes)
// Returns decoded Ascii text
var
Pdu: AnsiString;
NumSeptets: Byte;
Septets: AnsiString;
PduIdx: Integer;
PduLen: Integer;
by: Byte;
currBy: Byte;
left: Byte;
mask: Byte;
nextBy: Byte;
Octet: Byte;
NextOctet: Byte;
PaddingBits: Byte;
ShiftedOctet: Byte;
i: Integer;
begin
Result := '';
PaddingBits := 0;
// convert hex string to bytes
Pdu := AnsiString(HexToStr(APduData));
PduLen := Length(Pdu);
// The following code removes padding at the end of the PDU by shifting it
// *right* by <PaddingBits>. It does this by taking the least significant
// <PaddingBits> from the following PDU byte and moving them to the most
// significant the current PDU byte.
if (AUdhLen > 0) then
begin
PaddingBits := 7 - ((AUdhLen * 8) mod 7);
for PduIdx := 1 to PduLen do
begin
Octet := Byte(Pdu[PduIdx]);
if (PduIdx = PduLen) then
ShiftedOctet := Byte(Octet shr PaddingBits)
else
begin
NextOctet := Byte(Pdu[PduIdx + 1]);
ShiftedOctet := Byte(Octet shr PaddingBits) or
Byte(NextOctet shl (8 - PaddingBits));
end;
Byte(Pdu[PduIdx]) := ShiftedOctet;
end;
end;
// decode
// number of septets in PDU after excluding the padding bits
NumSeptets := ((PduLen * 8) - PaddingBits) div 7;
Septets := AnsiString(StringOfChar(#0, NumSeptets));
left := 7;
mask := $7F;
nextBy := 0;
PduIdx := 1;
for i := 1 to NumSeptets do
begin
if mask = 0 then
begin
Septets[i] := AnsiChar(nextBy);
left := 7;
mask := $7F;
nextBy := 0;
end
else
begin
if (PduIdx > PduLen) then
Break;
by := Byte(Pdu[PduIdx]);
Inc(PduIdx);
currBy := ((by AND mask) SHL (7 - left)) OR nextBy;
nextBy := (by AND (NOT mask)) SHR left;
Septets[i] := AnsiChar(currBy);
mask := mask SHR 1;
left := left - 1;
end;
end; // for
// remove last character if unused
// this is kind of a hack, but frankly I don't know how else to compensate
// for it.
if (Septets[NumSeptets] = #0) then
SetLength(Septets, NumSeptets - 1);
// convert 7-bit alphabet to ascii
Result := Convert7BitToAscii(Septets);
end;
initialization
InitializeTables;
end.
no you don't include the UDH part when encoding, but you if read the GSM phase 2 specification on page 57, they mention this fact : "If 7 bit data is used and the TP-UD-Header does not finish on a septet boundary then fill bits are inserted after the last Information Element Data octet so that there is an integral number of septets for the entire TP-UD header". When you include a UDH part this could not be the case, so all you need to do is calculate the offset (= number of fill bits)
Calculating the offset, this code assumes that UDHPart is a AnsiString:
Len := Length(UDHPart) shr 1;
Offset := 7 - ((Len * 8) mod 7); // fill bits
now when encoding the 7bit data, you proceed as normal but at the end, you shift the data Offset bits to the left, this code has the encoded data in variable result (ansistring):
// fill bits
if Offset > 0 then
begin
v := Result;
Len := Length(v);
BytesRemain := ceil(((Len * 7)+Offset) / 8);
Result := StringOfChar(#0, BytesRemain);
for InPos := 1 to BytesRemain do
begin
if InPos = 1 then
Byte(Result[InPos]) := Byte(v[InPos]) shl offset
else
Byte(Result[InPos]) := (Byte(v[InPos]) shl offset) or (Byte(v[InPos-1]) shr (8 - offset));
end;
end;
Decoding is same thing really, you first shift the 7 bit data offset bits to the right before decoding...
I hope this will set you onto the right track...