I have a problem converting legacy Access databases to MySQL. Access ported over multiple generations and in .accdb format. I assume they are encoded in Latin 1 case insensitive accent sensitive. I am wrestling with early poor decisions Access allowed me to make and I was too naive to know better.
The main field is a varchar field 255 char ("Title") with a unique index and contains text strings. The database is a long annotated list of single words and phrases. Most are short, but hundreds of thousands of entries are over 191 characters. I need them to be unique and accent sensitive (e.g. Aché are indigenous peoples in Paraguay. ache is to hurt).
If I import to utf8mb4, the maximum index length is 191 characters. I don't really need universal character and emoji capability in the Title field, but I do with others. I can create a "tiebreaker field" (188 char + 3 char tiebreaker unique index), But then I have the problem of identifying all the duplicates beforehand, and imports to MySQL drop hundreds of thousands of records.
I spent many hours trying to find documentation of this issue on the web. I've not encountered comprehensive treatments of these kinds of collation issues. I'm even only guessing at Access collation, since I haven't found a way to get Access to reveal it's exact encoding.
Any links or search terms I can use that are not so generic I just get hundreds of pages on collation-free descriptions of performing ideal transfers?
The main field is a varchar field 255 char ("Title") with a unique index and contains text strings. The database is a long annotated list of single words and phrases. Most are short, but hundreds of thousands of entries are over 191 characters. I need them to be unique and accent sensitive (e.g. Aché are indigenous peoples in Paraguay. ache is to hurt).
If I import to utf8mb4, the maximum index length is 191 characters. I don't really need universal character and emoji capability in the Title field, but I do with others. I can create a "tiebreaker field" (188 char + 3 char tiebreaker unique index), But then I have the problem of identifying all the duplicates beforehand, and imports to MySQL drop hundreds of thousands of records.
I spent many hours trying to find documentation of this issue on the web. I've not encountered comprehensive treatments of these kinds of collation issues. I'm even only guessing at Access collation, since I haven't found a way to get Access to reveal it's exact encoding.
Any links or search terms I can use that are not so generic I just get hundreds of pages on collation-free descriptions of performing ideal transfers?