So we know that a lot changed. What do we need to look for when
migrating an extension?
zval
Let's start from the inside and work our way out. Many of the
changes that matter when working on an extension are
fundamentally type-related, so let's start with the type that
underpins all of PHP.
typedef struct {
zend_uchar type;
zend_uint refcount__gc;
zend_uchar is_ref__gc;
union {
long lval;
double dval;
struct {
char *val;
int len;
} str;
HashTable *ht;
zend_object_value obj;
} value;
} zval;
Here's a simplified version of what the zval struct looks like in
PHP 5. The important thing to remember is that each type
corresponds to a member in the value union. Many of these values
have changed: either in structure or in type.
typedef struct {
zend_value value;
union {
struct {
zend_uchar type;
...
}
}
} zval;
Here's a simplified version of the PHP 7 zval. We'll talk about
the specifics of the types that have changed representation after
this, but I want you to note that the refcount and is_ref fields
are gone. zvals are passed around by value rather than reference,
so simple types don't need to be refcounted, and types that do
need to be are refcounted within their values rather than the
zval itself.
IS_LONG
PHP 5
long lval;
PHP 7
zend_long lval;
#if defined(__LP64__) || defined(_LP64) || defined(_WIN64)
typedef int64_t zend_long;
#else
typedef int32_t zend_long;
#endif
Integers are now this zend_long type, which means that integers
are now consistently 64 bit on all 64 bit platforms, including
Windows (which used to be an oddball). This is good, but think
about what happens if you're on Win64 and use long (which is 32
bit) with zend_parse_parameters().
IS_BOOL
PHP 5
long lval;
PHP 7
#define IS_FALSE 2
#define IS_TRUE 3
ZEND_API int zend_is_true(zval *op);
IS_BOOL is gone altogether! In PHP 5, it used the lval to
indicate whether it was true or false. In PHP 7, false and true
are separate types. The ZVAL_BOOL macro still exists for setting,
but to check the value you either have to check the type or call
zend_is_true().
IS_STRING
PHP 5
struct {
char *val;
int len;
} str;
PHP 7
zend_string *str;
Let's talk about a more interesting one. Three types are now
pointers to other structures with their own garbage collection.
The Zend Engine has retained the Z_STRLEN and Z_STRVAL macros,
but there's now a string structure that gets used throughout the
engine, not just for zvals. Let's look at it in more detail...
typedef struct {
zend_refcounted_h gc;
zend_ulong h;
size_t len;
char val[1];
} zend_string;
Key points: lengths are now size_t, not signed int, as $DEITY
intended. Garbage collection now takes place within the
structure. h is a cached hash value. It's variable length.
zend_string *zend_string_alloc(size_t len, int persistent);
zend_string *zend_string_init(const char *s, size_t len, int pers);
zend_string *zend_string_dup(zend_string *s, int persistent);
void zend_string_release(zend_string *s);
uint32_t zend_string_addref(zend_string *s);
uint32_t zend_string_delref(zend_string *s);
#define ZSTR_VAL(zstr) (zstr)->val
#define ZSTR_LEN(zstr) (zstr)->len
A new set of functions have been added to deal with zend_strings.
These are the most important ones (the full set is in
zend_string.h). Again, note that reference counting is done on
the string, not the zval, so it has "methods" to deal with that.
PHP 5
RETURN_STRING(str, duplicate)
ZVAL_STRING(zv, str, duplicate)
add_assoc_string(zv, key, str, duplicate)
PHP 7
RETURN_STRING(str)
ZVAL_STRING(zv, str)
add_assoc_string(zv, key, str)
Most macros and functions that dealt with setting strings had
parameters indicating whether you wanted to duplicate the input.
Those are now gone, since they have to create a zend_string
anyway and will always duplicate. You can override this by
instantiating the string directly using the zend_string API, but
why would you?
#if ZEND_MODULE_API_NO < 20151012
#undef ZVAL_STRING
#define ZVAL_STRING(zv, str) do { \
const char *__s = (s); \
int __l = strlen(str); \
zval *__z = (zv); \
Z_STRLEN_P(__z) = l; \
Z_STRVAL_P(__z) = estrndup(__s, __l); \
} while (0);
#endif
Again, if you're supporting both versions, your options are kind
of problematic. You're going to have to have compatibility
wrappers or redefine the macros (my preference, but make sure you
do it after including all PHP headers!). The downside is that
it's ugly, but you can basically crib the implementations from
the PHP 5.6 source. I'm going to show you how the sausage is
made, but there are wrappers available for some of this.
IS_OBJECT
PHP 5
typedef struct {
zend_object_handle handle;
const zend_object_handlers *handlers;
} zend_object_value;
zend_object_value obj;
PHP 7
zend_object *obj;
Another new type! I'm going to talk more about class and object
handling later, but let's focus for now on the representation. In
PHP 5, zend_object_value is a small inline structure with an
object handle, which is an index into a hash table.
typedef struct {
zend_refcounted_h gc;
uint32_t handle;
zend_class_entry *ce;
const zend_object_handlers *handlers;
HashTable *properties;
zval properties_table[1];
} zend_object;
This is a structure that it's probably rare that you'll poke
directly, but again: refcounting is done on the object. The class
entry and properties are now inline, which improves caching and
performance. Note that this is variable length, though: this is
important if you're overriding the create_object handler, and
I'll talk about it later. I won't get into the API, because it
hasn't changed much.
IS_RESOURCE
PHP 5
long lval;
PHP 7
zend_resource *res;
Finally, resources change from being indexes stored in the long
value to being pointers to their own structures.
typedef struct {
zend_refcounted_h gc;
int handle;
int type;
void *ptr;
} zend_resource;
This one's actually a huge improvement: the resource type is now
kept inline rather than having to poke through a murky API. The
bad news is that the API for resources in general has changed,
which I'll talk more about later.
#define IS_UNDEF 0
#define IS_REFERENCE 10
zend_reference *ref;
Finally, there are a bunch of new types. The two that are
important are UNDEF, which is for undefined variables (as the
name might suggest), and REFERENCE, which replaces the built in
is_ref field in the zval with a pointer to a refcounted
structure (remembering the earlier point about not having
refcounting in the zval itself).
Let's look at more concrete things you need to audit. We'll start
with parameter parsing, since that's not going to be caught by
the compiler.
PHP 5
char *str;
int len;
zend_parse_parameters(ZEND_NUM_ARGS(), "s", &str, &len);
PHP 7
char *str;
size_t len;
zend_parse_parameters(ZEND_NUM_ARGS(), "s", &str, &len);
For example: as I mentioned earlier, string lengths are now
size_t. If you don't change the length variable to size_t, you
get interesting looking segfaults on 64 bit platforms. This one's
insidious.
#if ZEND_MODULE_API_NO >= 20151012
typedef size_t zend_string_len_t;
#else
typedef int zend_string_len_t;
#endif
char *str;
zend_string_len_t len;
zend_parse_parameters(ZEND_NUM_ARGS(), "s", &str, &len);
How do you deal with this if you want to support both versions?
You can do some macro and typedef magic.
#if ZEND_MODULE_API_NO < 20151012
typedef long zend_long;
#endif
zend_long l;
zend_parse_parameters(ZEND_NUM_ARGS(), "l", &l);
I'd do something similar for zend_long too.
Arrays aren't hugely different, but there are two changes of
note, and one will bite you silently if you're not careful.
PHP 5
zend_hash_update(ht, "key", 4, &zv, sizeof(zval *), NULL);
zend_hash_find(ht, "key", 4, &zv);
PHP 7
zend_string *key = zend_string_init("key", 3, 0);
zend_hash_update(ht, key, zv);
zv = zend_hash_find(ht, key);
zend_string_release(key);
This is a good'un. Array keys in PHP 5 included the null
terminator in their length because… hell if I know. In PHP
7, they don't (partly because we're using zend_strings). You'll
also note that the API has changed significantly (for the better,
since it removes a bunch of parameters nobody ever used).
IS_PTR
zend_string *key;
my_struct *ptr;
zend_hash_update_ptr(ht, key, ptr);
ptr = (my_struct *) zend_hash_find_ptr(ht, key);
There's another aspect to this too: you would have noticed that
we didn't provide the size. The reason for this is because
HashTables now store zval pointers only. Instead, to store a raw
pointer, you use a parallel API that internally wraps the pointer
in a zval with the new IS_PTR type.
zval *
compat_zend_hash_find(HashTable *ht, const char *key, size_t len) {
#if ZEND_MODULE_API_NO >= 20151012
zend_string *zs = zend_string_init(key, len);
zval *val = zend_hash_find(ht, zs);
zend_string_release(zs);
return val;
#else
zval *val = NULL;
int res = zend_hash_find(ht, key, len + 1, &val);
return (res == SUCCESS) ? val : NULL;
#endif
}
This one's tricky to shim between versions. So far, everyone I've
seen who's done it has written wrappers with different names
— you can add the _ptr functions easily enough to PHP 5,
but that doesn't help with the other API changes. You need to
audit all zend_hash function calls and figure out if you want to
DIY or pull in a compatibility helper.
HashPosition pos;
ulong num_key;
char *key;
uint key_len;
zval **zv_pp;
zend_hash_internal_pointer_reset_ex(&ht, &pos);
while (zend_hash_get_current_data_ex(&ht, &zv_pp, &pos)
== SUCCESS) {
if (zend_hash_get_current_key_ex(&ht, &key, &key_len,
&num_key, 0, &pos) ==
HASH_KEY_IS_STRING) {
...
}
}
One amazing new feature that I want to highlight, even though
it's not directly a migration topic: HashTables now have these
incredible iteration macros. If you've had to iterate an array
before, you'll understand why this is a big deal. Here's the old
code...
ulong num_key;
zend_string *key;
zval *zv;
ZEND_HASH_FOREACH_KEY_VAL(ht, num_key, key, val) {
if (key) {
...
}
}
Let's talk about resources, those weird holdovers from the old
days. As I mentioned earlier, they're actually a fair bit nicer
to use, but that doesn't necessarily mean that you want to.
PHP 5
int zend_register_resource(zval *zv, void *ptr, int type);
void *zend_fetch_resource(zval **id, int default_id,
const char *name, int type,
int num_types, ...);
int zend_list_delete(int id);
PHP 7
zend_resource *zend_register_resource(void *ptr, int type);
void *zend_fetch_resource(zend_resource *res, const char *name,
int type);
void *zend_fetch_resource_ex(zval *res, const char *name,
int type);
int zend_list_close(zend_resource *res);
There were a set of macros on PHP 5 that mapped to the underlying
functions. As you can see, the functions have changed a tonne for
the better, but you can't really shim them in any meaningful way,
particularly since zend_register_resource() changed the zval on
PHP 5 but doesn't on PHP 7. On the bright side, the basic
workflow hasn't really changed: you register a pointer, fetch it,
and delete/close it.
#if ZEND_MODULE_API_NO >= 20151012
#define ZEND_REGISTER_RESOURCE(zv, ptr, type) \
ZVAL_RES(zv, zend_register_resource(ptr, type));
#define ZEND_FETCH_RESOURCE(zv, type, id, default_id, name, type) \
zend_fetch_resource_ex(zv, name, type);
#define ZEND_CLOSE_RESOURCE(zv) \
zend_list_close(Z_RES_P(zv))
#else
#define ZEND_CLOSE_RESOURCE(zv) \
zend_list_delete(Z_LVAL_P(zv))
#endif
Your options are either to write a wrapper like the hashtable
wrapper, or reimplement the macros on PHP 7. I personally prefer
the wrapper option (and implemented it in pecl-compat), but
here's a rough version of the macros if you'd prefer that (the
ignored values are unfortunate).
Let's talk about objects. Basic object handling is largely
unchanged, mercifully.
PHP 5
zval *zv;
zv = zend_read_property(ce, obj, name, strlen(name), 0);
PHP 7
zval rv;
zval *zv;
zv = zend_read_property(ce, obj, name, strlen(name), 0, &rv);
The one common API that has changed a bit is reading properties.
In PHP 7, you have to provide the storage for the returned value
(this is only used if there's a __get method or a custom
read_property, and you should use the return value "zv" and not
access "rv").
zval *
compat_zend_read_property(zend_class_entry *ce, zval *obj,
const char *name, int name_length,
int silent, zval *rv TSRMLS_DC) {
#if ZEND_MODULE_API_NO >= 20151012
return zend_read_property(ce, obj, name, name_length,
silent, rv);
#else
(void) rv;
return zend_read_property(ce, obj, name, name_length,
silent TSRMLS_CC);
#endif
}
As zend_read_property isn't a macro, you'll have to add a shim.
I'd go with a little inline function or macro, assuming you're
writing your own.
As I mentioned earlier, there is a difference with objects with
custom allocators because the zend_object struct is variable
length (to cope with properties).
PHP 5
typedef struct {
zend_object std;
my_struct *struct;
} my_object;
zend_object_value my_object_new(zend_class_entry *ce TSRMLS_DC) {
my_object *intern;
zend_object_value retval;
intern = emalloc(sizeof(my_object));
/* ... */
retval.handle = zend_objects_store_put(intern,
zend_objects_destroy_object, my_object_free, NULL TSRMLS_CC);
return retval;
}
In PHP 5, the start of a create_object handler looks like this
(in general, they have lots of boilerplate). You allocate the
structure, and then you later register it in the object store and
return that value. (Also the only slide with smaller text.
Sorry.)
PHP 5
typedef struct {
zend_object std;
my_struct *struct;
} my_object;
my_object *my_object_get(zval *zv TSRMLS_DC) {
return (my_object *)
zend_object_store_get_object(zv TSRMLS_CC);
}
Retrieving an object was easy: you'd just cast what you got back
from the object store.
PHP 7
typedef struct {
my_struct *struct;
zend_object std;
} my_object;
zend_object *my_object_new(zend_class_entry *ce) {
struct my_object *intern;
intern = emalloc(sizeof(my_object) +
zend_objects_properties_size(ce));
/* ... */
return &intern->std;
}
The PHP 7 version of create_object has three major differences.
Firstly, we've reordered the structure fields: the zend_object
has to be the last element now because it's variable length.
Secondly, there's the emalloc: you have to add the variable space
required for the class's declared properties. Finally, you don't
call zend_objects_store_put any more: you return a pointer to the
zend_object buried within your structure.
PHP 7
typedef struct {
my_struct *struct;
zend_object std;
} my_object;
my_object *my_object_get(zval *zv) {
zend_object *obj = Z_OBJ_P(zv);
return (my_object *)
((char *)(obj) - XtOffsetOf(my_object, std));
}
Oh. Oh no.
The getter is more complicated than before. The zend_object
pointer within the zval points partway through our struct, so we
use XtOffsetOf to calculate how far back we have to go. The char
* is because offsetof returns a value in bytes. XtOffsetOf
instead of offsetof is because of ancient compilers.
PHP 7
zend_class_entry *my_object_ce;
zend_object_handlers my_object_handlers;
PHP_MINIT_FUNCTION(my_extension) {
zend_class_entry ce;
INIT_CLASS_ENTRY(ce, "My\\Object", NULL);
my_object_ce = zend_register_internal_class_ex(&ce, NULL);
memcpy(&my_object_handlers, &std_object_handlers,
sizeof(zend_object_handlers));
my_object_handlers.offset = XtOffsetOf(my_object, std);
}
There's one other wrinkle, too, in PHP 7. You have to set the new
offset field on the object handlers structure to the offset of
your zend_object within your custom structure. This allows PHP to
free your entire structure when the object is deleted (although
you still have to free anything else you've allocated and pointed
to). The reality is that you're going to have two versions of
stuff to support both versions.