Numeric Character Reference转NSString

在处理数据的时候,有时候会遇到 &# 开头的数据,例如Σ那么这个是 Numeric Character Reference编码。
NCR编码是由一个与号(&)跟着一个井号(#), 然后跟着这个字符的Unicode编码值, 最后跟着一个分号组成的, 如:

&#nnnn;
或者
&#xhhhh

其中, nnnn是字符编码的十进制表示, 而hhhh是字符的16进制表示.
在ios中处理这种编码有两种方法,一种使用NSAttributedString的方式,这种比较简单,但是处理极慢。另外一种就是自己写编码转换。
两种实现的代码如下:

-(NSString *)toUnicodeString
{

    NSMutableString *srcString =    [[NSMutableString alloc]initWithString:self];
    if ([srcString containsString:@"&#"]) {
        [srcString replaceOccurrencesOfString:@"&#" withString:@"" options:NSLiteralSearch range:NSMakeRange(0,     [srcString length])];

    NSMutableString *desString = [[NSMutableString alloc]init];

    NSArray *arr = [srcString componentsSeparatedByString:@";"];

    for(int i=0;i<[arr count]-1;i++){

        NSString *v = [arr objectAtIndex:i];
        char *c = malloc(3);
        int value = [v intValue];
        c[1] = value  &0x00FF;
        c[0] = value >>8 &0x00FF;
        c[2] = '\0';
        [desString appendString:[NSString stringWithCString:c encoding:NSUnicodeStringEncoding]];
        free(c);
    }

    return desString;
}
else
{
    return self;
}

}
-(NSString *)toUnicodeString2
{

NSError * error=nil;
NSData *encodedData = [self dataUsingEncoding:  NSUTF8StringEncoding];
NSDictionary *options = @{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType};

NSAttributedString *attributedString = [[NSAttributedString alloc] initWithData:encodedData options:options documentAttributes:nil error:&error];

return [attributedString string];
}
This entry was posted in App, Mac OS X.

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre lang="" line="" escaped="" cssfile="">