C strings conversion to Python
Last Updated :
02 Apr, 2019
Improve
For C strings represented as a pair
C
To create a Unicode string and is it is known that s points to data encoded as UTF-8, the code given below can be used as -
C
If s is encoded in some other known encoding, a string using
C
If a wide string needs to be represented as
C
char *, int
, it is to decide whether or not - the string presented as a raw byte string or as a Unicode string.
Byte objects can be built using Py_BuildValue()
as
// Pointer to C string data
char *s;
// Length of data
int len;
// Make a bytes object
PyObject *obj = Py_BuildValue("y#", s, len);
PyObject *obj = Py_BuildValue("s#", s, len);
PyUnicode_Decode()
can be made as:
PyObject *obj = PyUnicode_Decode(s, len, "encoding", "errors");
// Example
obj = PyUnicode_Decode(s, len, "latin-1", "strict");
obj = PyUnicode_Decode(s, len, "ascii", "ignore");
wchar_t *, len
pair. Then are few options as shown below -
// Wide character string
wchar_t *w;
// Length
int len;
// Option 1 - use Py_BuildValue()
PyObject *obj = Py_BuildValue("u#", w, len);
// Option 2 - use PyUnicode_FromWideChar()
PyObject *obj = PyUnicode_FromWideChar(w, len);
- The data from C must be explicitly decoded into a string according to some codec
- Common encodings include ASCII, Latin-1, and UTF-8.
- If you’re encoding is not known, then it is best off to encode the string as bytes instead.
- Python always copies the string data (being provided) when making an object.
- Also, for better reliability, strings should be created using both a pointer and a size rather than relying on NULL-terminated data.