As a long-time PHP developer and encoding specialist, I was surprised to hear that the popular utf8_encode()
and utf8_decode()
functions will now trigger deprecation warnings in PHP 8.2.
These functions have been around for ages, so why the sudden axe?
In this comprehensive guide, I‘ll cover:
- The history and original intent behind the functions
- A quick dive into character encoding basics
- The reasoning and decision to deprecate
- How to view the deprecation warnings
- 3 alternative solutions with examples
- Encoding conversion cheat sheet
- Common troubleshooting tips
I‘ll also infuse my personal experiences and recommendations as an encoding expert throughout.
A Look Back at the History of These Functions
First, let‘s rewind and understand why utf8_encode()
and utf8_decode()
came about…
Back in PHP 4 era (the stone ages!), built-in encoding functionality was extremely limited. Developers were dealing with bugs due to mismatches between Latin-1, ISO 8859-1 and early forms of UTF-8.
To solve this, the original author of PHP created these two functions as a quick bandaid solution to convert between the most common encodings of the time.
Over the years they became relied upon by PHP developers to handle general UTF-8 conversion needs despite only supporting ISO-8859-1 and UTF-8.
Character Encoding Crash Course
Before we look at the deprecation decision, let‘s review some encoding fundamentals…
Character encoding is the process of mapping characters to numeric values that can be stored and transmitted across platforms and languages. This allows text to be represented digitally.
[diagram showing mapping of letter to numbers]Common types like ASCII, ISO 8859-1, UTF-8 and UTF-16 have their own mapping systems. When we convert between encodings, we simply re-map the numeric values.
Mixing encodings without proper conversion causes missing, corrupted, or unintended characters in text – essentially encoding errors.
These two functions aim to solve this by converting between ISO-8859-1 and UTF-8 specifically.
The Deprecation Decision – Why Now?
The PHP core contributors decided deprecation was needed due to a few key factors:
Limited Scope
The functions only handle ISO-8859-1 and UTF-8, not the 200+ encodings in use today.
Silent Failures
They fail silently without warnings if input string does not match expected encoding…
Inappropriate Names
Their names falsely imply ability to handle general UTF-8 conversion.
After 15+ years, better alternatives now exist that justify removing these outdated functions in PHP 8.2.
Developers attempting to use them will now see deprecation warnings.
PHP Throws Warnings Now
With the knowledge of why they are going away, let‘s look at what warnings will appear if you call them:
$utf8 = utf8_encode(‘Crème brûlée‘);
// Warning: utf8_encode() is deprecated
And decoding:
$text = utf8_decode($utf8);
// Warning: utf8_decode() is deprecated
The warnings are loud and clear!
Solution 1: mb_convert_encoding()
The recommended alternative is the mb_convert_encoding()
function. It works the same way but supports over 200 encodings:
$utf8 = mb_convert_encoding(‘Crème brûlée‘, ‘UTF-8‘, ‘ISO-8859-1‘);
$latin1 = mb_convert_encoding($utf8, ‘ISO-8859-1‘, ‘UTF-8‘);
By explicitly setting input and output encoding expectations, characters stay true.
Pros:
- Fully featured
- Built-in PHP function
- Actively maintained
- Handles over 200 character encodings!
Cons:
- Slightly more verbose
Solution 2: Intl Extension
The Intl extension contains Intl::convert()
with similar encoding conversion capabilities:
$utf8 = Intl::convert(‘Crème brûlée‘, ‘ISO-8859-1‘, ‘UTF-8‘);
$latin1 = Intl::convert($text, ‘UTF-8‘, ‘ISO-8859-1‘);
Pros:
- Clean interface
- Part of Internationalization extension suite
Cons:
- Extension needs installed
- Slightly less encoding support
Solution 3: iconv()
The iconv()
function has been around since PHP 4.2 with wide support for encodings:
$utf8 = iconv(‘ISO-8859-1‘, ‘UTF-8‘, ‘Crème brûlée‘);
$latin1 = iconv(‘UTF-8‘, ‘ISO-8859-1‘, $utf8);
Pros:
- Lightweight
- Long history
Cons:
- Less intuitive order of encodings
- Limited error handling
Encoding Conversion Cheat Sheet
Here is a quick reference for converting ISO-8859-1/Latin-1 and UTF-8 with all three alternatives:
Function | Latin1 > UTF8 | UTF8 > Latin1 |
---|---|---|
mb_convert_encoding | mb_convert_encoding($str, ‘UTF-8‘, ‘ISO-8859-1‘); |
mb_convert_encoding($str, ‘ISO-8859-1‘, ‘UTF-8‘); |
Intl::convert | Intl::convert($str, ‘ISO-8859-1‘, ‘UTF-8‘); |
Intl::convert($str, ‘UTF-8‘, ‘ISO-8859-1‘); |
iconv | iconv(‘ISO-8859-1‘, ‘UTF-8‘, $str); |
iconv(‘UTF-8‘, ‘ISO-8859-1‘, $str); |
Troubleshooting Tips
When converting encodings, here are some common issues and solutions:
Problem: Warning message about invalid characters
Solution: The input string does not match the expected encoding type. Double check your source encoding using mb_detect_encoding()
.
Problem: Output contains ? or � replacement characters
Solution: This normally means unmapped characters in target encoding. Try a different encoding pair.
Problem: No warnings but scrambled text
Solution: One of the encodings is incorrect. Print or log encodings before and after to test.
Closing Thoughts
I‘m hopeful this article has prepped you to upgrade your own code! To recap:
utf8_encode()
andutf8_decode()
served their purpose long ago but were limited in scope- The core PHP team deprecated them to promote more robust alternatives
- Always double check your input encodings when doing conversions
- Leverage mb_convert_encoding() for future-proof needs!
Contact me with any other encoding hot takes or questions. Happy coding!