GetOpenFileNameA has trouble with UTF-8 locale and UTF-8 encoded pathname

Alex Villací­s Lasso a_villacis at palosanto.com
Mon Nov 21 10:23:31 CST 2005


Consider the following MSVC program:
--------------------- cut -------------------------
// PruebaOpenDlg.cpp : Defines the entry point for the console application.
//
 
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
 
#include <windows.h>
 
 
int main(int argc, char* argv[])
{
    OPENFILENAME ofn;       // common dialog box structure
    char szFile[260];       // buffer for file name
 
    // Initialize OPENFILENAME
    ZeroMemory(&ofn, sizeof(OPENFILENAME));
    ofn.lStructSize = sizeof(OPENFILENAME);
    ofn.hwndOwner = NULL;
    ofn.lpstrFile = szFile;
    ofn.nMaxFile = sizeof(szFile);
    ofn.lpstrFilter = "All\0*.*\0Text\0*.TXT\0";
    ofn.nFilterIndex = 1;
    ofn.lpstrFileTitle = NULL;
    ofn.nMaxFileTitle = 0;
    ofn.lpstrInitialDir = NULL;
//    ofn.Flags = OFN_PATHMUSTEXIST | OFN_FILEMUSTEXIST;
 
    // Display the Open dialog box.  
    memset(szFile, 0, sizeof(szFile));
    if (GetOpenFileName(&ofn)==TRUE) {
        char * p;
        FILE * hFile;
 
        printf("Chosen filename is: %s\n", ofn.lpstrFile);
        printf("Byte encoding is  :");
        for (p = ofn.lpstrFile; *p; p++) {
            printf(" (%c %02x)", *p, *p);
        }
        printf("\n");
 
        hFile = fopen(ofn.lpstrFile, "rb");
        if (hFile != NULL) {
            fclose(hFile);
            puts("File is readable through specified filename");
        } else {
            printf("Unable to reach file through %s - %s\n",
                ofn.lpstrFile, strerror(errno));
        }
    }
    return 0;
}
--------------------- cut -------------------------

Consider also the following Linux environment: home directory is 
/home/alex, and
is mapped to drive F: in dosdevices. The home directory contains a 
directory
named gatón (the string contains a [U+00F3 LATIN SMALL LETTER O WITH 
ACUTE] and
is UTF-8 encoded as 0x67 0x61 0x74 0xC3 0xB3 0x6E), inside of which a 
sample
file exists, which is to be selected by the Open File dialog. All tests 
were
made in a Fedora Core 4 system with a *default* LANG=es_EC.UTF-8.

The symptom is that, when wine runs with an UTF-8 locale (as specified 
with the
LANG environment variable), and an attempt is made to choose a filename 
that is
UTF-8 encoded in the filesystem, GetOpenFileNameA may return a byte 
string that
CreateFile and other file functions are unable to map into a valid 
filename.
Whether GetOpenFileNameA returns a valid filename or not seems to depend 
on the
way the navigation is performed. That is, if the application starts the 
Open File
dialog from the current directory, and the user navigates by directory 
change only,
the invalid filename will be returned. However, if the user first chooses a
drive letter (such as F:) and then navigates from there, the filename 
returned is
a valid one.

The following tests illustrate the behavior. For each entry, the first 
two lines
are the conditions for the test. The remaining three lines are the 
actual output
from the supplied program, copied and pasted from the console. The 
instances of
\uffff seen are from invalid character encodings displayed in the console.

LANG=en_US
 From current directory /home/alex:
Chosen filename is: f:\gatón\Barenaked Ladies - One Week.mp3
Byte encoding is  : (f 66) (: 3a) (\ 5c) (g 67) (a 61) (t 74) (\uffff 
ffffffc3) (\uffff ffffffb3) (n 6e) (\ 5c) (B 42) (a 61) (r 72) (e 65) (n 
6e) (a 61) (k 6b) (e 65) (d 64) (  20) (L 4c) (a 61) (d 64) (i 69) (e 
65) (s 73) (  20) (- 2d) (  20) (O 4f) (n 6e) (e 65) (  20) (W 57) (e 
65) (e 65) (k 6b) (. 2e) (m 6d) (p 70) (3 33)
File is readable through specified filename

LANG=en_US
 From explicit choice from drive F: :
Chosen filename is: F:\gatón\Barenaked Ladies - One Week.mp3
Byte encoding is  : (F 46) (: 3a) (\ 5c) (g 67) (a 61) (t 74) (\uffff 
ffffffc3) (\uffff ffffffb3) (n 6e) (\ 5c) (B 42) (a 61) (r 72) (e 65) (n 
6e) (a 61) (k 6b) (e 65) (d 64) (  20) (L 4c) (a 61) (d 64) (i 69) (e 
65) (s 73) (  20) (- 2d) (  20) (O 4f) (n 6e) (e 65) (  20) (W 57) (e 
65) (e 65) (k 6b) (. 2e) (m 6d) (p 70) (3 33)
File is readable through specified filename

LANG=es_EC
 From current directory /home/alex:
Chosen filename is: f:\gatón\Barenaked Ladies - One Week.mp3
Byte encoding is  : (f 66) (: 3a) (\ 5c) (g 67) (a 61) (t 74) (\uffff 
ffffffc3) (\uffff ffffffb3) (n 6e) (\ 5c) (B 42) (a 61) (r 72) (e 65) (n 
6e) (a 61) (k 6b) (e 65) (d 64) (  20) (L 4c) (a 61) (d 64) (i 69) (e 
65) (s 73) (  20) (- 2d) (  20) (O 4f) (n 6e) (e 65) (  20) (W 57) (e 
65) (e 65) (k 6b) (. 2e) (m 6d) (p 70) (3 33)
File is readable through specified filename

LANG=es_EC
 From explicit choice from drive F: :
Chosen filename is: F:\gatón\Barenaked Ladies - One Week.mp3
Byte encoding is  : (F 46) (: 3a) (\ 5c) (g 67) (a 61) (t 74) (\uffff 
ffffffc3) (\uffff ffffffb3) (n 6e) (\ 5c) (B 42) (a 61) (r 72) (e 65) (n 
6e) (a 61) (k 6b) (e 65) (d 64) (  20) (L 4c) (a 61) (d 64) (i 69) (e 
65) (s 73) (  20) (- 2d) (  20) (O 4f) (n 6e) (e 65) (  20) (W 57) (e 
65) (e 65) (k 6b) (. 2e) (m 6d) (p 70) (3 33)
File is readable through specified filename

LANG=es_EC.UTF-8
 From current directory /home/alex:
Chosen filename is: f:\gatón\Barenaked Ladies - One Week.mp3
Byte encoding is  : (f 66) (: 3a) (\ 5c) (g 67) (a 61) (t 74) (\uffff 
ffffffc3) (\uffff ffffffb3) (n 6e) (\ 5c) (B 42) (a 61) (r 72) (e 65) (n 
6e) (a 61) (k 6b) (e 65) (d 64) (  20) (L 4c) (a 61) (d 64) (i 69) (e 
65) (s 73) (  20) (- 2d) (  20) (O 4f) (n 6e) (e 65) (  20) (W 57) (e 
65) (e 65) (k 6b) (. 2e) (m 6d) (p 70) (3 33)
Unable to reach file through f:\gatón\Barenaked Ladies - One Week.mp3 - 
No such file or directory

LANG=es_EC.UTF-8
 From explicit choice from drive F: :
Chosen filename is: F:\gat\uffffn\Barenaked Ladies - One Week.mp3
Byte encoding is  : (F 46) (: 3a) (\ 5c) (g 67) (a 61) (t 74) (\uffff 
fffffff3) (n 6e) (\ 5c) (B 42) (a 61) (r 72) (e 65) (n 6e) (a 61) (k 6b) 
(e 65) (d 64) (  20) (L 4c) (a 61) (d 64) (i 69) (e 65) (s 73) (  20) (- 
2d) (  20) (O 4f) (n 6e) (e 65) (  20) (W 57) (e 65) (e 65) (k 6b) (. 
2e) (m 6d) (p 70) (3 33)
File is readable through specified filename

Case 5 is incorrect, but is the easiest to hit in the UTF-8 locales.

This problem is significant because all Fedora distributions since at least
Fedora Core 2 have UTF-8 support, which is probably enabled in non-US 
locales.
Other popular distributions probably have this UTF-8 support enabled 
too. I am
posting this on wine-devel instead of creating a bug report because I 
wanted to
receive some comments on what the expected behavior should be before 
trying to
submit a patch myself. Unless somebody says otherwise, I would try to 
submit a
patch that makes case 5 behave like case 6, by modifying the encoding of the
ANSI string to match what the file-open functions would expect for the 
filename.
However, this essentially requires an answer to the following question: 
should
non-Unicode strings that represent filenames be UTF-8 encoded, or locale 
encoded?
In the UTF-8 locales, GetOpenFileNameA seems to think UTF-8 encoded 
sometimes,
but the file open functions expect locale-encoded (in my case is 
ISO-8859-1).
Therefore, the incorrect behavior. How would the answer change (if at 
all) for
Chinese or Japanese locales with a need for multibyte characters?

Alex Villacís Lasso




More information about the wine-devel mailing list