Game of Unicode

Detecting unicodes of characters

In this chapter, we are going to learn about unicodes and their applications.

Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. The Unicode Standard has been adopted by such industry leaders as Apple, HP, IBM, JustSystems, Microsoft, Oracle, SAP, Sun, Sybase, Unisys and many others.

In this session, we will detect that whether given content of any .docx file are wriiten in 'Devanagri' or not.

Consider, We have a file 'test.docx',

from docx import Document

document = Document('test.docx')
document.save('test.docx')

Now we can access our 'test.docx' using python-docx package.

We have a hard-coded list:

excludeList = [32,33,34,35,36,37,38,39,40,41,42,43,44,45, 46,47,58,59,60,61,62,63,64,91,92,93,94,95, 96,123,124,125,126,161,163,165,166,169,174, 175,177,183,184,188,189,190]

In above list, we have given decimal values of all symbols which are common in almost all major languages, which we will use in next function.

Now let's have a look on below function:

def is_devnagri(paragraph):
  for character in paragraph.text:
    if ord(character) in excludeList or (ord(character)>2304 and ord(character)<2431):
      print 'This sentence is in Devnagri'
    else:
      print 'This sentence is not in Devnagri'

Note: ord(character) - Given a string representing one Unicode character, return an integer representing the Unicode code point of that character.

1.In above function, we take a paragraph as input parameter for function.

2.On second line, we looped that paragraph against its each character.

3.After that, we wrote two condition in if sentence, 1.given character should be in excludeList as they are common symbols for all major languages.

OR

2.given character should be in range of decimal value 2304 to 2431 because these are the decimal values of Devnagri characters.

If that character satisfies those condition/s, then it is devnagri character otherwise, that character is not devnagri.

results matching ""

    No results matching ""