GetOpenFileNameA has trouble with UTF-8 locale and UTF-8 encoded
pathname
Alex Villacís Lasso
a_villacis at palosanto.com
Mon Nov 21 10:23:31 CST 2005
Consider the following MSVC program:
--------------------- cut -------------------------
// PruebaOpenDlg.cpp : Defines the entry point for the console application.
//
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <windows.h>
int main(int argc, char* argv[])
{
OPENFILENAME ofn; // common dialog box structure
char szFile[260]; // buffer for file name
// Initialize OPENFILENAME
ZeroMemory(&ofn, sizeof(OPENFILENAME));
ofn.lStructSize = sizeof(OPENFILENAME);
ofn.hwndOwner = NULL;
ofn.lpstrFile = szFile;
ofn.nMaxFile = sizeof(szFile);
ofn.lpstrFilter = "All\0*.*\0Text\0*.TXT\0";
ofn.nFilterIndex = 1;
ofn.lpstrFileTitle = NULL;
ofn.nMaxFileTitle = 0;
ofn.lpstrInitialDir = NULL;
// ofn.Flags = OFN_PATHMUSTEXIST | OFN_FILEMUSTEXIST;
// Display the Open dialog box.
memset(szFile, 0, sizeof(szFile));
if (GetOpenFileName(&ofn)==TRUE) {
char * p;
FILE * hFile;
printf("Chosen filename is: %s\n", ofn.lpstrFile);
printf("Byte encoding is :");
for (p = ofn.lpstrFile; *p; p++) {
printf(" (%c %02x)", *p, *p);
}
printf("\n");
hFile = fopen(ofn.lpstrFile, "rb");
if (hFile != NULL) {
fclose(hFile);
puts("File is readable through specified filename");
} else {
printf("Unable to reach file through %s - %s\n",
ofn.lpstrFile, strerror(errno));
}
}
return 0;
}
--------------------- cut -------------------------
Consider also the following Linux environment: home directory is
/home/alex, and
is mapped to drive F: in dosdevices. The home directory contains a
directory
named gatón (the string contains a [U+00F3 LATIN SMALL LETTER O WITH
ACUTE] and
is UTF-8 encoded as 0x67 0x61 0x74 0xC3 0xB3 0x6E), inside of which a
sample
file exists, which is to be selected by the Open File dialog. All tests
were
made in a Fedora Core 4 system with a *default* LANG=es_EC.UTF-8.
The symptom is that, when wine runs with an UTF-8 locale (as specified
with the
LANG environment variable), and an attempt is made to choose a filename
that is
UTF-8 encoded in the filesystem, GetOpenFileNameA may return a byte
string that
CreateFile and other file functions are unable to map into a valid
filename.
Whether GetOpenFileNameA returns a valid filename or not seems to depend
on the
way the navigation is performed. That is, if the application starts the
Open File
dialog from the current directory, and the user navigates by directory
change only,
the invalid filename will be returned. However, if the user first chooses a
drive letter (such as F:) and then navigates from there, the filename
returned is
a valid one.
The following tests illustrate the behavior. For each entry, the first
two lines
are the conditions for the test. The remaining three lines are the
actual output
from the supplied program, copied and pasted from the console. The
instances of
\uffff seen are from invalid character encodings displayed in the console.
LANG=en_US
From current directory /home/alex:
Chosen filename is: f:\gatón\Barenaked Ladies - One Week.mp3
Byte encoding is : (f 66) (: 3a) (\ 5c) (g 67) (a 61) (t 74) (\uffff
ffffffc3) (\uffff ffffffb3) (n 6e) (\ 5c) (B 42) (a 61) (r 72) (e 65) (n
6e) (a 61) (k 6b) (e 65) (d 64) ( 20) (L 4c) (a 61) (d 64) (i 69) (e
65) (s 73) ( 20) (- 2d) ( 20) (O 4f) (n 6e) (e 65) ( 20) (W 57) (e
65) (e 65) (k 6b) (. 2e) (m 6d) (p 70) (3 33)
File is readable through specified filename
LANG=en_US
From explicit choice from drive F: :
Chosen filename is: F:\gatón\Barenaked Ladies - One Week.mp3
Byte encoding is : (F 46) (: 3a) (\ 5c) (g 67) (a 61) (t 74) (\uffff
ffffffc3) (\uffff ffffffb3) (n 6e) (\ 5c) (B 42) (a 61) (r 72) (e 65) (n
6e) (a 61) (k 6b) (e 65) (d 64) ( 20) (L 4c) (a 61) (d 64) (i 69) (e
65) (s 73) ( 20) (- 2d) ( 20) (O 4f) (n 6e) (e 65) ( 20) (W 57) (e
65) (e 65) (k 6b) (. 2e) (m 6d) (p 70) (3 33)
File is readable through specified filename
LANG=es_EC
From current directory /home/alex:
Chosen filename is: f:\gatón\Barenaked Ladies - One Week.mp3
Byte encoding is : (f 66) (: 3a) (\ 5c) (g 67) (a 61) (t 74) (\uffff
ffffffc3) (\uffff ffffffb3) (n 6e) (\ 5c) (B 42) (a 61) (r 72) (e 65) (n
6e) (a 61) (k 6b) (e 65) (d 64) ( 20) (L 4c) (a 61) (d 64) (i 69) (e
65) (s 73) ( 20) (- 2d) ( 20) (O 4f) (n 6e) (e 65) ( 20) (W 57) (e
65) (e 65) (k 6b) (. 2e) (m 6d) (p 70) (3 33)
File is readable through specified filename
LANG=es_EC
From explicit choice from drive F: :
Chosen filename is: F:\gatón\Barenaked Ladies - One Week.mp3
Byte encoding is : (F 46) (: 3a) (\ 5c) (g 67) (a 61) (t 74) (\uffff
ffffffc3) (\uffff ffffffb3) (n 6e) (\ 5c) (B 42) (a 61) (r 72) (e 65) (n
6e) (a 61) (k 6b) (e 65) (d 64) ( 20) (L 4c) (a 61) (d 64) (i 69) (e
65) (s 73) ( 20) (- 2d) ( 20) (O 4f) (n 6e) (e 65) ( 20) (W 57) (e
65) (e 65) (k 6b) (. 2e) (m 6d) (p 70) (3 33)
File is readable through specified filename
LANG=es_EC.UTF-8
From current directory /home/alex:
Chosen filename is: f:\gatón\Barenaked Ladies - One Week.mp3
Byte encoding is : (f 66) (: 3a) (\ 5c) (g 67) (a 61) (t 74) (\uffff
ffffffc3) (\uffff ffffffb3) (n 6e) (\ 5c) (B 42) (a 61) (r 72) (e 65) (n
6e) (a 61) (k 6b) (e 65) (d 64) ( 20) (L 4c) (a 61) (d 64) (i 69) (e
65) (s 73) ( 20) (- 2d) ( 20) (O 4f) (n 6e) (e 65) ( 20) (W 57) (e
65) (e 65) (k 6b) (. 2e) (m 6d) (p 70) (3 33)
Unable to reach file through f:\gatón\Barenaked Ladies - One Week.mp3 -
No such file or directory
LANG=es_EC.UTF-8
From explicit choice from drive F: :
Chosen filename is: F:\gat\uffffn\Barenaked Ladies - One Week.mp3
Byte encoding is : (F 46) (: 3a) (\ 5c) (g 67) (a 61) (t 74) (\uffff
fffffff3) (n 6e) (\ 5c) (B 42) (a 61) (r 72) (e 65) (n 6e) (a 61) (k 6b)
(e 65) (d 64) ( 20) (L 4c) (a 61) (d 64) (i 69) (e 65) (s 73) ( 20) (-
2d) ( 20) (O 4f) (n 6e) (e 65) ( 20) (W 57) (e 65) (e 65) (k 6b) (.
2e) (m 6d) (p 70) (3 33)
File is readable through specified filename
Case 5 is incorrect, but is the easiest to hit in the UTF-8 locales.
This problem is significant because all Fedora distributions since at least
Fedora Core 2 have UTF-8 support, which is probably enabled in non-US
locales.
Other popular distributions probably have this UTF-8 support enabled
too. I am
posting this on wine-devel instead of creating a bug report because I
wanted to
receive some comments on what the expected behavior should be before
trying to
submit a patch myself. Unless somebody says otherwise, I would try to
submit a
patch that makes case 5 behave like case 6, by modifying the encoding of the
ANSI string to match what the file-open functions would expect for the
filename.
However, this essentially requires an answer to the following question:
should
non-Unicode strings that represent filenames be UTF-8 encoded, or locale
encoded?
In the UTF-8 locales, GetOpenFileNameA seems to think UTF-8 encoded
sometimes,
but the file open functions expect locale-encoded (in my case is
ISO-8859-1).
Therefore, the incorrect behavior. How would the answer change (if at
all) for
Chinese or Japanese locales with a need for multibyte characters?
Alex Villacís Lasso
More information about the wine-devel
mailing list