• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1 /* stb_image - v2.08 - public domain image loader - http://nothings.org/stb_image.h
2                                      no warranty implied; use at your own risk
3 
4    Do this:
5       #define STB_IMAGE_IMPLEMENTATION
6    before you include this file in *one* C or C++ file to create the implementation.
7 
8    // i.e. it should look like this:
9    #include ...
10    #include ...
11    #include ...
12    #define STB_IMAGE_IMPLEMENTATION
13    #include "stb_image.h"
14 
15    You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
16    And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
17 
18 
19    QUICK NOTES:
20       Primarily of interest to game developers and other people who can
21           avoid problematic images and only need the trivial interface
22 
23       JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
24       PNG 1/2/4/8-bit-per-channel (16 bpc not supported)
25 
26       TGA (not sure what subset, if a subset)
27       BMP non-1bpp, non-RLE
28       PSD (composited view only, no extra channels, 8/16 bit-per-channel)
29 
30       GIF (*comp always reports as 4-channel)
31       HDR (radiance rgbE format)
32       PIC (Softimage PIC)
33       PNM (PPM and PGM binary only)
34 
35       Animated GIF still needs a proper API, but here's one way to do it:
36           http://gist.github.com/urraka/685d9a6340b26b830d49
37 
38       - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
39       - decode from arbitrary I/O callbacks
40       - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
41 
42    Full documentation under "DOCUMENTATION" below.
43 
44 
45    Revision 2.00 release notes:
46 
47       - Progressive JPEG is now supported.
48 
49       - PPM and PGM binary formats are now supported, thanks to Ken Miller.
50 
51       - x86 platforms now make use of SSE2 SIMD instructions for
52         JPEG decoding, and ARM platforms can use NEON SIMD if requested.
53         This work was done by Fabian "ryg" Giesen. SSE2 is used by
54         default, but NEON must be enabled explicitly; see docs.
55 
56         With other JPEG optimizations included in this version, we see
57         2x speedup on a JPEG on an x86 machine, and a 1.5x speedup
58         on a JPEG on an ARM machine, relative to previous versions of this
59         library. The same results will not obtain for all JPGs and for all
60         x86/ARM machines. (Note that progressive JPEGs are significantly
61         slower to decode than regular JPEGs.) This doesn't mean that this
62         is the fastest JPEG decoder in the land; rather, it brings it
63         closer to parity with standard libraries. If you want the fastest
64         decode, look elsewhere. (See "Philosophy" section of docs below.)
65 
66         See final bullet items below for more info on SIMD.
67 
68       - Added STBI_MALLOC, STBI_REALLOC, and STBI_FREE macros for replacing
69         the memory allocator. Unlike other STBI libraries, these macros don't
70         support a context parameter, so if you need to pass a context in to
71         the allocator, you'll have to store it in a global or a thread-local
72         variable.
73 
74       - Split existing STBI_NO_HDR flag into two flags, STBI_NO_HDR and
75         STBI_NO_LINEAR.
76             STBI_NO_HDR:     suppress implementation of .hdr reader format
77             STBI_NO_LINEAR:  suppress high-dynamic-range light-linear float API
78 
79       - You can suppress implementation of any of the decoders to reduce
80         your code footprint by #defining one or more of the following
81         symbols before creating the implementation.
82 
83             STBI_NO_JPEG
84             STBI_NO_PNG
85             STBI_NO_BMP
86             STBI_NO_PSD
87             STBI_NO_TGA
88             STBI_NO_GIF
89             STBI_NO_HDR
90             STBI_NO_PIC
91             STBI_NO_PNM   (.ppm and .pgm)
92 
93       - You can request *only* certain decoders and suppress all other ones
94         (this will be more forward-compatible, as addition of new decoders
95         doesn't require you to disable them explicitly):
96 
97             STBI_ONLY_JPEG
98             STBI_ONLY_PNG
99             STBI_ONLY_BMP
100             STBI_ONLY_PSD
101             STBI_ONLY_TGA
102             STBI_ONLY_GIF
103             STBI_ONLY_HDR
104             STBI_ONLY_PIC
105             STBI_ONLY_PNM   (.ppm and .pgm)
106 
107          Note that you can define multiples of these, and you will get all
108          of them ("only x" and "only y" is interpreted to mean "only x&y").
109 
110        - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
111          want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
112 
113       - Compilation of all SIMD code can be suppressed with
114             #define STBI_NO_SIMD
115         It should not be necessary to disable SIMD unless you have issues
116         compiling (e.g. using an x86 compiler which doesn't support SSE
117         intrinsics or that doesn't support the method used to detect
118         SSE2 support at run-time), and even those can be reported as
119         bugs so I can refine the built-in compile-time checking to be
120         smarter.
121 
122       - The old STBI_SIMD system which allowed installing a user-defined
123         IDCT etc. has been removed. If you need this, don't upgrade. My
124         assumption is that almost nobody was doing this, and those who
125         were will find the built-in SIMD more satisfactory anyway.
126 
127       - RGB values computed for JPEG images are slightly different from
128         previous versions of stb_image. (This is due to using less
129         integer precision in SIMD.) The C code has been adjusted so
130         that the same RGB values will be computed regardless of whether
131         SIMD support is available, so your app should always produce
132         consistent results. But these results are slightly different from
133         previous versions. (Specifically, about 3% of available YCbCr values
134         will compute different RGB results from pre-1.49 versions by +-1;
135         most of the deviating values are one smaller in the G channel.)
136 
137       - If you must produce consistent results with previous versions of
138         stb_image, #define STBI_JPEG_OLD and you will get the same results
139         you used to; however, you will not get the SIMD speedups for
140         the YCbCr-to-RGB conversion step (although you should still see
141         significant JPEG speedup from the other changes).
142 
143         Please note that STBI_JPEG_OLD is a temporary feature; it will be
144         removed in future versions of the library. It is only intended for
145         near-term back-compatibility use.
146 
147 
148    Latest revision history:
149       2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
150       2.07  (2015-09-13) partial animated GIF support
151                          limited 16-bit PSD support
152                          minor bugs, code cleanup, and compiler warnings
153       2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
154       2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
155       2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
156       2.03  (2015-04-12) additional corruption checking
157                          stbi_set_flip_vertically_on_load
158                          fix NEON support; fix mingw support
159       2.02  (2015-01-19) fix incorrect assert, fix warning
160       2.01  (2015-01-17) fix various warnings
161       2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
162       2.00  (2014-12-25) optimize JPEG, including x86 SSE2 & ARM NEON SIMD
163                          progressive JPEG
164                          PGM/PPM support
165                          STBI_MALLOC,STBI_REALLOC,STBI_FREE
166                          STBI_NO_*, STBI_ONLY_*
167                          GIF bugfix
168       1.48  (2014-12-14) fix incorrectly-named assert()
169       1.47  (2014-12-14) 1/2/4-bit PNG support (both grayscale and paletted)
170                          optimize PNG
171                          fix bug in interlaced PNG with user-specified channel count
172 
173    See end of file for full revision history.
174 
175 
176  ============================    Contributors    =========================
177 
178  Image formats                                Bug fixes & warning fixes
179     Sean Barrett (jpeg, png, bmp)                Marc LeBlanc
180     Nicolas Schulz (hdr, psd)                    Christpher Lloyd
181     Jonathan Dummer (tga)                        Dave Moore
182     Jean-Marc Lienher (gif)                      Won Chun
183     Tom Seddon (pic)                             the Horde3D community
184     Thatcher Ulrich (psd)                        Janez Zemva
185     Ken Miller (pgm, ppm)                        Jonathan Blow
186     urraka@github (animated gif)                 Laurent Gomila
187                                                  Aruelien Pocheville
188                                                  Ryamond Barbiero
189                                                  David Woo
190  Extensions, features                            Martin Golini
191     Jetro Lauha (stbi_info)                      Roy Eltham
192     Martin "SpartanJ" Golini (stbi_info)         Luke Graham
193     James "moose2000" Brown (iPhone PNG)         Thomas Ruf
194     Ben "Disch" Wenger (io callbacks)            John Bartholomew
195     Omar Cornut (1/2/4-bit PNG)                  Ken Hamada
196     Nicolas Guillemot (vertical flip)            Cort Stratton
197     Richard Mitton (16-bit PSD)                  Blazej Dariusz Roszkowski
198                                                  Thibault Reuille
199                                                  Paul Du Bois
200                                                  Guillaume George
201                                                  Jerry Jansson
202                                                  Hayaki Saito
203                                                  Johan Duparc
204                                                  Ronny Chevalier
205  Optimizations & bugfixes                        Michal Cichon
206     Fabian "ryg" Giesen                          Tero Hanninen
207     Arseny Kapoulkine                            Sergio Gonzalez
208                                                  Cass Everitt
209                                                  Engin Manap
210   If your name should be here but                Martins Mozeiko
211   isn't, let Sean know.                          Joseph Thomson
212                                                  Phil Jordan
213                                                  Nathan Reed
214                                                  Michaelangel007@github
215                                                  Nick Verigakis
216 
217 LICENSE
218 
219 This software is in the public domain. Where that dedication is not
220 recognized, you are granted a perpetual, irrevocable license to copy,
221 distribute, and modify this file as you see fit.
222 
223 */
224 
225 #ifndef STBI_INCLUDE_STB_IMAGE_H
226 #define STBI_INCLUDE_STB_IMAGE_H
227 
228 // DOCUMENTATION
229 //
230 // Limitations:
231 //    - no 16-bit-per-channel PNG
232 //    - no 12-bit-per-channel JPEG
233 //    - no JPEGs with arithmetic coding
234 //    - no 1-bit BMP
235 //    - GIF always returns *comp=4
236 //
237 // Basic usage (see HDR discussion below for HDR usage):
238 //    int x,y,n;
239 //    unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
240 //    // ... process data if not NULL ...
241 //    // ... x = width, y = height, n = # 8-bit components per pixel ...
242 //    // ... replace '0' with '1'..'4' to force that many components per pixel
243 //    // ... but 'n' will always be the number that it would have been if you said 0
244 //    stbi_image_free(data)
245 //
246 // Standard parameters:
247 //    int *x       -- outputs image width in pixels
248 //    int *y       -- outputs image height in pixels
249 //    int *comp    -- outputs # of image components in image file
250 //    int req_comp -- if non-zero, # of image components requested in result
251 //
252 // The return value from an image loader is an 'unsigned char *' which points
253 // to the pixel data, or NULL on an allocation failure or if the image is
254 // corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
255 // with each pixel consisting of N interleaved 8-bit components; the first
256 // pixel pointed to is top-left-most in the image. There is no padding between
257 // image scanlines or between pixels, regardless of format. The number of
258 // components N is 'req_comp' if req_comp is non-zero, or *comp otherwise.
259 // If req_comp is non-zero, *comp has the number of components that _would_
260 // have been output otherwise. E.g. if you set req_comp to 4, you will always
261 // get RGBA output, but you can check *comp to see if it's trivially opaque
262 // because e.g. there were only 3 channels in the source image.
263 //
264 // An output image with N components has the following components interleaved
265 // in this order in each pixel:
266 //
267 //     N=#comp     components
268 //       1           grey
269 //       2           grey, alpha
270 //       3           red, green, blue
271 //       4           red, green, blue, alpha
272 //
273 // If image loading fails for any reason, the return value will be NULL,
274 // and *x, *y, *comp will be unchanged. The function stbi_failure_reason()
275 // can be queried for an extremely brief, end-user unfriendly explanation
276 // of why the load failed. Define STBI_NO_FAILURE_STRINGS to avoid
277 // compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
278 // more user-friendly ones.
279 //
280 // Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
281 //
282 // ===========================================================================
283 //
284 // Philosophy
285 //
286 // stb libraries are designed with the following priorities:
287 //
288 //    1. easy to use
289 //    2. easy to maintain
290 //    3. good performance
291 //
292 // Sometimes I let "good performance" creep up in priority over "easy to maintain",
293 // and for best performance I may provide less-easy-to-use APIs that give higher
294 // performance, in addition to the easy to use ones. Nevertheless, it's important
295 // to keep in mind that from the standpoint of you, a client of this library,
296 // all you care about is #1 and #3, and stb libraries do not emphasize #3 above all.
297 //
298 // Some secondary priorities arise directly from the first two, some of which
299 // make more explicit reasons why performance can't be emphasized.
300 //
301 //    - Portable ("ease of use")
302 //    - Small footprint ("easy to maintain")
303 //    - No dependencies ("ease of use")
304 //
305 // ===========================================================================
306 //
307 // I/O callbacks
308 //
309 // I/O callbacks allow you to read from arbitrary sources, like packaged
310 // files or some other source. Data read from callbacks are processed
311 // through a small internal buffer (currently 128 bytes) to try to reduce
312 // overhead.
313 //
314 // The three functions you must define are "read" (reads some bytes of data),
315 // "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
316 //
317 // ===========================================================================
318 //
319 // SIMD support
320 //
321 // The JPEG decoder will try to automatically use SIMD kernels on x86 when
322 // supported by the compiler. For ARM Neon support, you must explicitly
323 // request it.
324 //
325 // (The old do-it-yourself SIMD API is no longer supported in the current
326 // code.)
327 //
328 // On x86, SSE2 will automatically be used when available based on a run-time
329 // test; if not, the generic C versions are used as a fall-back. On ARM targets,
330 // the typical path is to have separate builds for NEON and non-NEON devices
331 // (at least this is true for iOS and Android). Therefore, the NEON support is
332 // toggled by a build flag: define STBI_NEON to get NEON loops.
333 //
334 // The output of the JPEG decoder is slightly different from versions where
335 // SIMD support was introduced (that is, for versions before 1.49). The
336 // difference is only +-1 in the 8-bit RGB channels, and only on a small
337 // fraction of pixels. You can force the pre-1.49 behavior by defining
338 // STBI_JPEG_OLD, but this will disable some of the SIMD decoding path
339 // and hence cost some performance.
340 //
341 // If for some reason you do not want to use any of SIMD code, or if
342 // you have issues compiling it, you can disable it entirely by
343 // defining STBI_NO_SIMD.
344 //
345 // ===========================================================================
346 //
347 // HDR image support   (disable by defining STBI_NO_HDR)
348 //
349 // stb_image now supports loading HDR images in general, and currently
350 // the Radiance .HDR file format, although the support is provided
351 // generically. You can still load any file through the existing interface;
352 // if you attempt to load an HDR file, it will be automatically remapped to
353 // LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
354 // both of these constants can be reconfigured through this interface:
355 //
356 //     stbi_hdr_to_ldr_gamma(2.2f);
357 //     stbi_hdr_to_ldr_scale(1.0f);
358 //
359 // (note, do not use _inverse_ constants; stbi_image will invert them
360 // appropriately).
361 //
362 // Additionally, there is a new, parallel interface for loading files as
363 // (linear) floats to preserve the full dynamic range:
364 //
365 //    float *data = stbi_loadf(filename, &x, &y, &n, 0);
366 //
367 // If you load LDR images through this interface, those images will
368 // be promoted to floating point values, run through the inverse of
369 // constants corresponding to the above:
370 //
371 //     stbi_ldr_to_hdr_scale(1.0f);
372 //     stbi_ldr_to_hdr_gamma(2.2f);
373 //
374 // Finally, given a filename (or an open file or memory block--see header
375 // file for details) containing image data, you can query for the "most
376 // appropriate" interface to use (that is, whether the image is HDR or
377 // not), using:
378 //
379 //     stbi_is_hdr(char *filename);
380 //
381 // ===========================================================================
382 //
383 // iPhone PNG support:
384 //
385 // By default we convert iphone-formatted PNGs back to RGB, even though
386 // they are internally encoded differently. You can disable this conversion
387 // by by calling stbi_convert_iphone_png_to_rgb(0), in which case
388 // you will always just get the native iphone "format" through (which
389 // is BGR stored in RGB).
390 //
391 // Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
392 // pixel to remove any premultiplied alpha *only* if the image file explicitly
393 // says there's premultiplied data (currently only happens in iPhone images,
394 // and only if iPhone convert-to-rgb processing is on).
395 //
396 
397 
398 #ifndef STBI_NO_STDIO
399 #include <stdio.h>
400 #endif // STBI_NO_STDIO
401 
402 #define STBI_VERSION 1
403 
404 enum
405 {
406    STBI_default = 0, // only used for req_comp
407 
408    STBI_grey       = 1,
409    STBI_grey_alpha = 2,
410    STBI_rgb        = 3,
411    STBI_rgb_alpha  = 4
412 };
413 
414 typedef unsigned char stbi_uc;
415 
416 #ifdef __cplusplus
417 extern "C" {
418 #endif
419 
420 #ifdef STB_IMAGE_STATIC
421 #define STBIDEF static
422 #else
423 #define STBIDEF extern
424 #endif
425 
426 //////////////////////////////////////////////////////////////////////////////
427 //
428 // PRIMARY API - works on images of any type
429 //
430 
431 //
432 // load image by filename, open file, or memory buffer
433 //
434 
435 typedef struct
436 {
437    int      (*read)  (void *user,char *data,int size);   // fill 'data' with 'size' bytes.  return number of bytes actually read
438    void     (*skip)  (void *user,int n);                 // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
439    int      (*eof)   (void *user);                       // returns nonzero if we are at end of file/data
440 } stbi_io_callbacks;
441 
442 STBIDEF stbi_uc *stbi_load               (char              const *filename,           int *x, int *y, int *comp, int req_comp);
443 STBIDEF stbi_uc *stbi_load_from_memory   (stbi_uc           const *buffer, int len   , int *x, int *y, int *comp, int req_comp);
444 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk  , void *user, int *x, int *y, int *comp, int req_comp);
445 
446 #ifndef STBI_NO_STDIO
447 STBIDEF stbi_uc *stbi_load_from_file  (FILE *f,                  int *x, int *y, int *comp, int req_comp);
448 // for stbi_load_from_file, file pointer is left pointing immediately after image
449 #endif
450 
451 #ifndef STBI_NO_LINEAR
452    STBIDEF float *stbi_loadf                 (char const *filename,           int *x, int *y, int *comp, int req_comp);
453    STBIDEF float *stbi_loadf_from_memory     (stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp);
454    STBIDEF float *stbi_loadf_from_callbacks  (stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp);
455 
456    #ifndef STBI_NO_STDIO
457    STBIDEF float *stbi_loadf_from_file  (FILE *f,                int *x, int *y, int *comp, int req_comp);
458    #endif
459 #endif
460 
461 #ifndef STBI_NO_HDR
462    STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma);
463    STBIDEF void   stbi_hdr_to_ldr_scale(float scale);
464 #endif
465 
466 #ifndef STBI_NO_LINEAR
467    STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma);
468    STBIDEF void   stbi_ldr_to_hdr_scale(float scale);
469 #endif // STBI_NO_HDR
470 
471 // stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
472 STBIDEF int    stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
473 STBIDEF int    stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
474 #ifndef STBI_NO_STDIO
475 STBIDEF int      stbi_is_hdr          (char const *filename);
476 STBIDEF int      stbi_is_hdr_from_file(FILE *f);
477 #endif // STBI_NO_STDIO
478 
479 
480 // get a VERY brief reason for failure
481 // NOT THREADSAFE
482 STBIDEF const char *stbi_failure_reason  (void);
483 
484 // free the loaded image -- this is just free()
485 STBIDEF void     stbi_image_free      (void *retval_from_stbi_load);
486 
487 // get image dimensions & components without fully decoding
488 STBIDEF int      stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
489 STBIDEF int      stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
490 
491 #ifndef STBI_NO_STDIO
492 STBIDEF int      stbi_info            (char const *filename,     int *x, int *y, int *comp);
493 STBIDEF int      stbi_info_from_file  (FILE *f,                  int *x, int *y, int *comp);
494 
495 #endif
496 
497 
498 
499 // for image formats that explicitly notate that they have premultiplied alpha,
500 // we just return the colors as stored in the file. set this flag to force
501 // unpremultiplication. results are undefined if the unpremultiply overflow.
502 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
503 
504 // indicate whether we should process iphone images back to canonical format,
505 // or just pass them through "as-is"
506 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
507 
508 // flip the image vertically, so the first pixel in the output array is the bottom left
509 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
510 
511 // ZLIB client - used by PNG, available for other purposes
512 
513 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
514 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
515 STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
516 STBIDEF int   stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
517 
518 STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
519 STBIDEF int   stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
520 
521 
522 #ifdef __cplusplus
523 }
524 #endif
525 
526 //
527 //
528 ////   end header file   /////////////////////////////////////////////////////
529 #endif // STBI_INCLUDE_STB_IMAGE_H
530 
531 #ifdef STB_IMAGE_IMPLEMENTATION
532 
533 #if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
534   || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
535   || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
536   || defined(STBI_ONLY_ZLIB)
537    #ifndef STBI_ONLY_JPEG
538    #define STBI_NO_JPEG
539    #endif
540    #ifndef STBI_ONLY_PNG
541    #define STBI_NO_PNG
542    #endif
543    #ifndef STBI_ONLY_BMP
544    #define STBI_NO_BMP
545    #endif
546    #ifndef STBI_ONLY_PSD
547    #define STBI_NO_PSD
548    #endif
549    #ifndef STBI_ONLY_TGA
550    #define STBI_NO_TGA
551    #endif
552    #ifndef STBI_ONLY_GIF
553    #define STBI_NO_GIF
554    #endif
555    #ifndef STBI_ONLY_HDR
556    #define STBI_NO_HDR
557    #endif
558    #ifndef STBI_ONLY_PIC
559    #define STBI_NO_PIC
560    #endif
561    #ifndef STBI_ONLY_PNM
562    #define STBI_NO_PNM
563    #endif
564 #endif
565 
566 #if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
567 #define STBI_NO_ZLIB
568 #endif
569 
570 
571 #include <stdarg.h>
572 #include <stddef.h> // ptrdiff_t on osx
573 #include <stdlib.h>
574 #include <string.h>
575 #include <limits.h>
576 
577 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
578 #include <math.h>  // ldexp
579 #endif
580 
581 #ifndef STBI_NO_STDIO
582 #include <stdio.h>
583 #endif
584 
585 #ifndef STBI_ASSERT
586 #include <assert.h>
587 #define STBI_ASSERT(x) assert(x)
588 #endif
589 
590 
591 #ifndef _MSC_VER
592    #ifdef __cplusplus
593    #define stbi_inline inline
594    #else
595    #define stbi_inline
596    #endif
597 #else
598    #define stbi_inline __forceinline
599 #endif
600 
601 
602 #ifdef _MSC_VER
603 typedef unsigned short stbi__uint16;
604 typedef   signed short stbi__int16;
605 typedef unsigned int   stbi__uint32;
606 typedef   signed int   stbi__int32;
607 #else
608 #include <stdint.h>
609 typedef uint16_t stbi__uint16;
610 typedef int16_t  stbi__int16;
611 typedef uint32_t stbi__uint32;
612 typedef int32_t  stbi__int32;
613 #endif
614 
615 // should produce compiler error if size is wrong
616 typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
617 
618 #ifdef _MSC_VER
619 #define STBI_NOTUSED(v)  (void)(v)
620 #else
621 #define STBI_NOTUSED(v)  (void)sizeof(v)
622 #endif
623 
624 #ifdef _MSC_VER
625 #define STBI_HAS_LROTL
626 #endif
627 
628 #ifdef STBI_HAS_LROTL
629    #define stbi_lrot(x,y)  _lrotl(x,y)
630 #else
631    #define stbi_lrot(x,y)  (((x) << (y)) | ((x) >> (32 - (y))))
632 #endif
633 
634 #if defined(STBI_MALLOC) && defined(STBI_FREE) && defined(STBI_REALLOC)
635 // ok
636 #elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC)
637 // ok
638 #else
639 #error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC."
640 #endif
641 
642 #ifndef STBI_MALLOC
643 #define STBI_MALLOC(sz)    malloc(sz)
644 #define STBI_REALLOC(p,sz) realloc(p,sz)
645 #define STBI_FREE(p)       free(p)
646 #endif
647 
648 // x86/x64 detection
649 #if defined(__x86_64__) || defined(_M_X64)
650 #define STBI__X64_TARGET
651 #elif defined(__i386) || defined(_M_IX86)
652 #define STBI__X86_TARGET
653 #endif
654 
655 #if defined(__GNUC__) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET)) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
656 // NOTE: not clear do we actually need this for the 64-bit path?
657 // gcc doesn't support sse2 intrinsics unless you compile with -msse2,
658 // (but compiling with -msse2 allows the compiler to use SSE2 everywhere;
659 // this is just broken and gcc are jerks for not fixing it properly
660 // http://www.virtualdub.org/blog/pivot/entry.php?id=363 )
661 #define STBI_NO_SIMD
662 #endif
663 
664 #if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
665 // Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
666 //
667 // 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
668 // Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
669 // As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
670 // simultaneously enabling "-mstackrealign".
671 //
672 // See https://github.com/nothings/stb/issues/81 for more information.
673 //
674 // So default to no SSE2 on 32-bit MinGW. If you've read this far and added
675 // -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
676 #define STBI_NO_SIMD
677 #endif
678 
679 #if !defined(STBI_NO_SIMD) && defined(STBI__X86_TARGET)
680 #define STBI_SSE2
681 #include <emmintrin.h>
682 
683 #ifdef _MSC_VER
684 
685 #if _MSC_VER >= 1400  // not VC6
686 #include <intrin.h> // __cpuid
stbi__cpuid3(void)687 static int stbi__cpuid3(void)
688 {
689    int info[4];
690    __cpuid(info,1);
691    return info[3];
692 }
693 #else
stbi__cpuid3(void)694 static int stbi__cpuid3(void)
695 {
696    int res;
697    __asm {
698       mov  eax,1
699       cpuid
700       mov  res,edx
701    }
702    return res;
703 }
704 #endif
705 
706 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
707 
stbi__sse2_available()708 static int stbi__sse2_available()
709 {
710    int info3 = stbi__cpuid3();
711    return ((info3 >> 26) & 1) != 0;
712 }
713 #else // assume GCC-style if not VC++
714 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
715 
stbi__sse2_available()716 static int stbi__sse2_available()
717 {
718 #if defined(__GNUC__) && (__GNUC__ * 100 + __GNUC_MINOR__) >= 408 // GCC 4.8 or later
719    // GCC 4.8+ has a nice way to do this
720    return __builtin_cpu_supports("sse2");
721 #else
722    // portable way to do this, preferably without using GCC inline ASM?
723    // just bail for now.
724    return 0;
725 #endif
726 }
727 #endif
728 #endif
729 
730 // ARM NEON
731 #if defined(STBI_NO_SIMD) && defined(STBI_NEON)
732 #undef STBI_NEON
733 #endif
734 
735 #ifdef STBI_NEON
736 #include <arm_neon.h>
737 // assume GCC or Clang on ARM targets
738 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
739 #endif
740 
741 #ifndef STBI_SIMD_ALIGN
742 #define STBI_SIMD_ALIGN(type, name) type name
743 #endif
744 
745 ///////////////////////////////////////////////
746 //
747 //  stbi__context struct and start_xxx functions
748 
749 // stbi__context structure is our basic context used by all images, so it
750 // contains all the IO context, plus some basic image information
751 typedef struct
752 {
753    stbi__uint32 img_x, img_y;
754    int img_n, img_out_n;
755 
756    stbi_io_callbacks io;
757    void *io_user_data;
758 
759    int read_from_callbacks;
760    int buflen;
761    stbi_uc buffer_start[128];
762 
763    stbi_uc *img_buffer, *img_buffer_end;
764    stbi_uc *img_buffer_original, *img_buffer_original_end;
765 } stbi__context;
766 
767 
768 static void stbi__refill_buffer(stbi__context *s);
769 
770 // initialize a memory-decode context
stbi__start_mem(stbi__context * s,stbi_uc const * buffer,int len)771 static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
772 {
773    s->io.read = NULL;
774    s->read_from_callbacks = 0;
775    s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
776    s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *) buffer+len;
777 }
778 
779 // initialize a callback-based context
stbi__start_callbacks(stbi__context * s,stbi_io_callbacks * c,void * user)780 static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
781 {
782    s->io = *c;
783    s->io_user_data = user;
784    s->buflen = sizeof(s->buffer_start);
785    s->read_from_callbacks = 1;
786    s->img_buffer_original = s->buffer_start;
787    stbi__refill_buffer(s);
788    s->img_buffer_original_end = s->img_buffer_end;
789 }
790 
791 #ifndef STBI_NO_STDIO
792 
stbi__stdio_read(void * user,char * data,int size)793 static int stbi__stdio_read(void *user, char *data, int size)
794 {
795    return (int) fread(data,1,size,(FILE*) user);
796 }
797 
stbi__stdio_skip(void * user,int n)798 static void stbi__stdio_skip(void *user, int n)
799 {
800    fseek((FILE*) user, n, SEEK_CUR);
801 }
802 
stbi__stdio_eof(void * user)803 static int stbi__stdio_eof(void *user)
804 {
805    return feof((FILE*) user);
806 }
807 
808 static stbi_io_callbacks stbi__stdio_callbacks =
809 {
810    stbi__stdio_read,
811    stbi__stdio_skip,
812    stbi__stdio_eof,
813 };
814 
stbi__start_file(stbi__context * s,FILE * f)815 static void stbi__start_file(stbi__context *s, FILE *f)
816 {
817    stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
818 }
819 
820 //static void stop_file(stbi__context *s) { }
821 
822 #endif // !STBI_NO_STDIO
823 
stbi__rewind(stbi__context * s)824 static void stbi__rewind(stbi__context *s)
825 {
826    // conceptually rewind SHOULD rewind to the beginning of the stream,
827    // but we just rewind to the beginning of the initial buffer, because
828    // we only use it after doing 'test', which only ever looks at at most 92 bytes
829    s->img_buffer = s->img_buffer_original;
830    s->img_buffer_end = s->img_buffer_original_end;
831 }
832 
833 #ifndef STBI_NO_JPEG
834 static int      stbi__jpeg_test(stbi__context *s);
835 static stbi_uc *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
836 static int      stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
837 #endif
838 
839 #ifndef STBI_NO_PNG
840 static int      stbi__png_test(stbi__context *s);
841 static stbi_uc *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
842 static int      stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
843 #endif
844 
845 #ifndef STBI_NO_BMP
846 static int      stbi__bmp_test(stbi__context *s);
847 static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
848 static int      stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
849 #endif
850 
851 #ifndef STBI_NO_TGA
852 static int      stbi__tga_test(stbi__context *s);
853 static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
854 static int      stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
855 #endif
856 
857 #ifndef STBI_NO_PSD
858 static int      stbi__psd_test(stbi__context *s);
859 static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
860 static int      stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
861 #endif
862 
863 #ifndef STBI_NO_HDR
864 static int      stbi__hdr_test(stbi__context *s);
865 static float   *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
866 static int      stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
867 #endif
868 
869 #ifndef STBI_NO_PIC
870 static int      stbi__pic_test(stbi__context *s);
871 static stbi_uc *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
872 static int      stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
873 #endif
874 
875 #ifndef STBI_NO_GIF
876 static int      stbi__gif_test(stbi__context *s);
877 static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
878 static int      stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
879 #endif
880 
881 #ifndef STBI_NO_PNM
882 static int      stbi__pnm_test(stbi__context *s);
883 static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
884 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
885 #endif
886 
887 // this is not threadsafe
888 static const char *stbi__g_failure_reason;
889 
stbi_failure_reason(void)890 STBIDEF const char *stbi_failure_reason(void)
891 {
892    return stbi__g_failure_reason;
893 }
894 
stbi__err(const char * str)895 static int stbi__err(const char *str)
896 {
897    stbi__g_failure_reason = str;
898    return 0;
899 }
900 
stbi__malloc(size_t size)901 static void *stbi__malloc(size_t size)
902 {
903     return STBI_MALLOC(size);
904 }
905 
906 // stbi__err - error
907 // stbi__errpf - error returning pointer to float
908 // stbi__errpuc - error returning pointer to unsigned char
909 
910 #ifdef STBI_NO_FAILURE_STRINGS
911    #define stbi__err(x,y)  0
912 #elif defined(STBI_FAILURE_USERMSG)
913    #define stbi__err(x,y)  stbi__err(y)
914 #else
915    #define stbi__err(x,y)  stbi__err(x)
916 #endif
917 
918 #define stbi__errpf(x,y)   ((float *)(size_t) (stbi__err(x,y)?NULL:NULL))
919 #define stbi__errpuc(x,y)  ((unsigned char *)(size_t) (stbi__err(x,y)?NULL:NULL))
920 
stbi_image_free(void * retval_from_stbi_load)921 STBIDEF void stbi_image_free(void *retval_from_stbi_load)
922 {
923    STBI_FREE(retval_from_stbi_load);
924 }
925 
926 #ifndef STBI_NO_LINEAR
927 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
928 #endif
929 
930 #ifndef STBI_NO_HDR
931 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp);
932 #endif
933 
934 static int stbi__vertically_flip_on_load = 0;
935 
stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)936 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
937 {
938     stbi__vertically_flip_on_load = flag_true_if_should_flip;
939 }
940 
stbi__load_main(stbi__context * s,int * x,int * y,int * comp,int req_comp)941 static unsigned char *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
942 {
943    #ifndef STBI_NO_JPEG
944    if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp);
945    #endif
946    #ifndef STBI_NO_PNG
947    if (stbi__png_test(s))  return stbi__png_load(s,x,y,comp,req_comp);
948    #endif
949    #ifndef STBI_NO_BMP
950    if (stbi__bmp_test(s))  return stbi__bmp_load(s,x,y,comp,req_comp);
951    #endif
952    #ifndef STBI_NO_GIF
953    if (stbi__gif_test(s))  return stbi__gif_load(s,x,y,comp,req_comp);
954    #endif
955    #ifndef STBI_NO_PSD
956    if (stbi__psd_test(s))  return stbi__psd_load(s,x,y,comp,req_comp);
957    #endif
958    #ifndef STBI_NO_PIC
959    if (stbi__pic_test(s))  return stbi__pic_load(s,x,y,comp,req_comp);
960    #endif
961    #ifndef STBI_NO_PNM
962    if (stbi__pnm_test(s))  return stbi__pnm_load(s,x,y,comp,req_comp);
963    #endif
964 
965    #ifndef STBI_NO_HDR
966    if (stbi__hdr_test(s)) {
967       float *hdr = stbi__hdr_load(s, x,y,comp,req_comp);
968       if (hdr == NULL) {
969          return NULL;
970       }
971       return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
972    }
973    #endif
974 
975    #ifndef STBI_NO_TGA
976    // test tga last because it's a crappy test!
977    if (stbi__tga_test(s))
978       return stbi__tga_load(s,x,y,comp,req_comp);
979    #endif
980 
981    return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
982 }
983 
stbi__load_flip(stbi__context * s,int * x,int * y,int * comp,int req_comp)984 static unsigned char *stbi__load_flip(stbi__context *s, int *x, int *y, int *comp, int req_comp)
985 {
986    unsigned char *result = stbi__load_main(s, x, y, comp, req_comp);
987 
988    if (stbi__vertically_flip_on_load && result != NULL) {
989       int w = *x, h = *y;
990       int depth = req_comp ? req_comp : *comp;
991       int row,col,z;
992       stbi_uc temp;
993 
994       // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
995       for (row = 0; row < (h>>1); row++) {
996          for (col = 0; col < w; col++) {
997             for (z = 0; z < depth; z++) {
998                temp = result[(row * w + col) * depth + z];
999                result[(row * w + col) * depth + z] = result[((h - row - 1) * w + col) * depth + z];
1000                result[((h - row - 1) * w + col) * depth + z] = temp;
1001             }
1002          }
1003       }
1004    }
1005 
1006    return result;
1007 }
1008 
1009 #ifndef STBI_NO_HDR
stbi__float_postprocess(float * result,int * x,int * y,int * comp,int req_comp)1010 static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
1011 {
1012    if (stbi__vertically_flip_on_load && result != NULL) {
1013       int w = *x, h = *y;
1014       int depth = req_comp ? req_comp : *comp;
1015       int row,col,z;
1016       float temp;
1017 
1018       // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
1019       for (row = 0; row < (h>>1); row++) {
1020          for (col = 0; col < w; col++) {
1021             for (z = 0; z < depth; z++) {
1022                temp = result[(row * w + col) * depth + z];
1023                result[(row * w + col) * depth + z] = result[((h - row - 1) * w + col) * depth + z];
1024                result[((h - row - 1) * w + col) * depth + z] = temp;
1025             }
1026          }
1027       }
1028    }
1029 }
1030 #endif
1031 
1032 #ifndef STBI_NO_STDIO
1033 
stbi__fopen(char const * filename,char const * mode)1034 static FILE *stbi__fopen(char const *filename, char const *mode)
1035 {
1036    FILE *f;
1037 #if defined(_MSC_VER) && _MSC_VER >= 1400
1038    if (0 != fopen_s(&f, filename, mode))
1039       f=0;
1040 #else
1041    f = fopen(filename, mode);
1042 #endif
1043    return f;
1044 }
1045 
1046 
stbi_load(char const * filename,int * x,int * y,int * comp,int req_comp)1047 STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
1048 {
1049    FILE *f = stbi__fopen(filename, "rb");
1050    unsigned char *result;
1051    if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
1052    result = stbi_load_from_file(f,x,y,comp,req_comp);
1053    fclose(f);
1054    return result;
1055 }
1056 
stbi_load_from_file(FILE * f,int * x,int * y,int * comp,int req_comp)1057 STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1058 {
1059    unsigned char *result;
1060    stbi__context s;
1061    stbi__start_file(&s,f);
1062    result = stbi__load_flip(&s,x,y,comp,req_comp);
1063    if (result) {
1064       // need to 'unget' all the characters in the IO buffer
1065       fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
1066    }
1067    return result;
1068 }
1069 #endif //!STBI_NO_STDIO
1070 
stbi_load_from_memory(stbi_uc const * buffer,int len,int * x,int * y,int * comp,int req_comp)1071 STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1072 {
1073    stbi__context s;
1074    stbi__start_mem(&s,buffer,len);
1075    return stbi__load_flip(&s,x,y,comp,req_comp);
1076 }
1077 
stbi_load_from_callbacks(stbi_io_callbacks const * clbk,void * user,int * x,int * y,int * comp,int req_comp)1078 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1079 {
1080    stbi__context s;
1081    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1082    return stbi__load_flip(&s,x,y,comp,req_comp);
1083 }
1084 
1085 #ifndef STBI_NO_LINEAR
stbi__loadf_main(stbi__context * s,int * x,int * y,int * comp,int req_comp)1086 static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1087 {
1088    unsigned char *data;
1089    #ifndef STBI_NO_HDR
1090    if (stbi__hdr_test(s)) {
1091       float *hdr_data = stbi__hdr_load(s,x,y,comp,req_comp);
1092       if (hdr_data)
1093          stbi__float_postprocess(hdr_data,x,y,comp,req_comp);
1094       return hdr_data;
1095    }
1096    #endif
1097    data = stbi__load_flip(s, x, y, comp, req_comp);
1098    if (data)
1099       return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
1100    return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
1101 }
1102 
stbi_loadf_from_memory(stbi_uc const * buffer,int len,int * x,int * y,int * comp,int req_comp)1103 STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1104 {
1105    stbi__context s;
1106    stbi__start_mem(&s,buffer,len);
1107    return stbi__loadf_main(&s,x,y,comp,req_comp);
1108 }
1109 
stbi_loadf_from_callbacks(stbi_io_callbacks const * clbk,void * user,int * x,int * y,int * comp,int req_comp)1110 STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1111 {
1112    stbi__context s;
1113    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1114    return stbi__loadf_main(&s,x,y,comp,req_comp);
1115 }
1116 
1117 #ifndef STBI_NO_STDIO
stbi_loadf(char const * filename,int * x,int * y,int * comp,int req_comp)1118 STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
1119 {
1120    float *result;
1121    FILE *f = stbi__fopen(filename, "rb");
1122    if (!f) return stbi__errpf("can't fopen", "Unable to open file");
1123    result = stbi_loadf_from_file(f,x,y,comp,req_comp);
1124    fclose(f);
1125    return result;
1126 }
1127 
stbi_loadf_from_file(FILE * f,int * x,int * y,int * comp,int req_comp)1128 STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1129 {
1130    stbi__context s;
1131    stbi__start_file(&s,f);
1132    return stbi__loadf_main(&s,x,y,comp,req_comp);
1133 }
1134 #endif // !STBI_NO_STDIO
1135 
1136 #endif // !STBI_NO_LINEAR
1137 
1138 // these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
1139 // defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
1140 // reports false!
1141 
stbi_is_hdr_from_memory(stbi_uc const * buffer,int len)1142 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
1143 {
1144    #ifndef STBI_NO_HDR
1145    stbi__context s;
1146    stbi__start_mem(&s,buffer,len);
1147    return stbi__hdr_test(&s);
1148    #else
1149    STBI_NOTUSED(buffer);
1150    STBI_NOTUSED(len);
1151    return 0;
1152    #endif
1153 }
1154 
1155 #ifndef STBI_NO_STDIO
stbi_is_hdr(char const * filename)1156 STBIDEF int      stbi_is_hdr          (char const *filename)
1157 {
1158    FILE *f = stbi__fopen(filename, "rb");
1159    int result=0;
1160    if (f) {
1161       result = stbi_is_hdr_from_file(f);
1162       fclose(f);
1163    }
1164    return result;
1165 }
1166 
stbi_is_hdr_from_file(FILE * f)1167 STBIDEF int      stbi_is_hdr_from_file(FILE *f)
1168 {
1169    #ifndef STBI_NO_HDR
1170    stbi__context s;
1171    stbi__start_file(&s,f);
1172    return stbi__hdr_test(&s);
1173    #else
1174    STBI_NOTUSED(f);
1175    return 0;
1176    #endif
1177 }
1178 #endif // !STBI_NO_STDIO
1179 
stbi_is_hdr_from_callbacks(stbi_io_callbacks const * clbk,void * user)1180 STBIDEF int      stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
1181 {
1182    #ifndef STBI_NO_HDR
1183    stbi__context s;
1184    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1185    return stbi__hdr_test(&s);
1186    #else
1187    STBI_NOTUSED(clbk);
1188    STBI_NOTUSED(user);
1189    return 0;
1190    #endif
1191 }
1192 
1193 static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
1194 static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
1195 
1196 #ifndef STBI_NO_LINEAR
stbi_ldr_to_hdr_gamma(float gamma)1197 STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
stbi_ldr_to_hdr_scale(float scale)1198 STBIDEF void   stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
1199 #endif
1200 
stbi_hdr_to_ldr_gamma(float gamma)1201 STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
stbi_hdr_to_ldr_scale(float scale)1202 STBIDEF void   stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
1203 
1204 
1205 //////////////////////////////////////////////////////////////////////////////
1206 //
1207 // Common code used by all image loaders
1208 //
1209 
1210 enum
1211 {
1212    STBI__SCAN_load=0,
1213    STBI__SCAN_type,
1214    STBI__SCAN_header
1215 };
1216 
stbi__refill_buffer(stbi__context * s)1217 static void stbi__refill_buffer(stbi__context *s)
1218 {
1219    int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
1220    if (n == 0) {
1221       // at end of file, treat same as if from memory, but need to handle case
1222       // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
1223       s->read_from_callbacks = 0;
1224       s->img_buffer = s->buffer_start;
1225       s->img_buffer_end = s->buffer_start+1;
1226       *s->img_buffer = 0;
1227    } else {
1228       s->img_buffer = s->buffer_start;
1229       s->img_buffer_end = s->buffer_start + n;
1230    }
1231 }
1232 
stbi__get8(stbi__context * s)1233 stbi_inline static stbi_uc stbi__get8(stbi__context *s)
1234 {
1235    if (s->img_buffer < s->img_buffer_end)
1236       return *s->img_buffer++;
1237    if (s->read_from_callbacks) {
1238       stbi__refill_buffer(s);
1239       return *s->img_buffer++;
1240    }
1241    return 0;
1242 }
1243 
stbi__at_eof(stbi__context * s)1244 stbi_inline static int stbi__at_eof(stbi__context *s)
1245 {
1246    if (s->io.read) {
1247       if (!(s->io.eof)(s->io_user_data)) return 0;
1248       // if feof() is true, check if buffer = end
1249       // special case: we've only got the special 0 character at the end
1250       if (s->read_from_callbacks == 0) return 1;
1251    }
1252 
1253    return s->img_buffer >= s->img_buffer_end;
1254 }
1255 
stbi__skip(stbi__context * s,int n)1256 static void stbi__skip(stbi__context *s, int n)
1257 {
1258    if (n < 0) {
1259       s->img_buffer = s->img_buffer_end;
1260       return;
1261    }
1262    if (s->io.read) {
1263       int blen = (int) (s->img_buffer_end - s->img_buffer);
1264       if (blen < n) {
1265          s->img_buffer = s->img_buffer_end;
1266          (s->io.skip)(s->io_user_data, n - blen);
1267          return;
1268       }
1269    }
1270    s->img_buffer += n;
1271 }
1272 
stbi__getn(stbi__context * s,stbi_uc * buffer,int n)1273 static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
1274 {
1275    if (s->io.read) {
1276       int blen = (int) (s->img_buffer_end - s->img_buffer);
1277       if (blen < n) {
1278          int res, count;
1279 
1280          memcpy(buffer, s->img_buffer, blen);
1281 
1282          count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
1283          res = (count == (n-blen));
1284          s->img_buffer = s->img_buffer_end;
1285          return res;
1286       }
1287    }
1288 
1289    if (s->img_buffer+n <= s->img_buffer_end) {
1290       memcpy(buffer, s->img_buffer, n);
1291       s->img_buffer += n;
1292       return 1;
1293    } else
1294       return 0;
1295 }
1296 
stbi__get16be(stbi__context * s)1297 static int stbi__get16be(stbi__context *s)
1298 {
1299    int z = stbi__get8(s);
1300    return (z << 8) + stbi__get8(s);
1301 }
1302 
stbi__get32be(stbi__context * s)1303 static stbi__uint32 stbi__get32be(stbi__context *s)
1304 {
1305    stbi__uint32 z = stbi__get16be(s);
1306    return (z << 16) + stbi__get16be(s);
1307 }
1308 
1309 #if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
1310 // nothing
1311 #else
stbi__get16le(stbi__context * s)1312 static int stbi__get16le(stbi__context *s)
1313 {
1314    int z = stbi__get8(s);
1315    return z + (stbi__get8(s) << 8);
1316 }
1317 #endif
1318 
1319 #ifndef STBI_NO_BMP
stbi__get32le(stbi__context * s)1320 static stbi__uint32 stbi__get32le(stbi__context *s)
1321 {
1322    stbi__uint32 z = stbi__get16le(s);
1323    return z + (stbi__get16le(s) << 16);
1324 }
1325 #endif
1326 
1327 #define STBI__BYTECAST(x)  ((stbi_uc) ((x) & 255))  // truncate int to byte without warnings
1328 
1329 
1330 //////////////////////////////////////////////////////////////////////////////
1331 //
1332 //  generic converter from built-in img_n to req_comp
1333 //    individual types do this automatically as much as possible (e.g. jpeg
1334 //    does all cases internally since it needs to colorspace convert anyway,
1335 //    and it never has alpha, so very few cases ). png can automatically
1336 //    interleave an alpha=255 channel, but falls back to this for other cases
1337 //
1338 //  assume data buffer is malloced, so malloc a new one and free that one
1339 //  only failure mode is malloc failing
1340 
stbi__compute_y(int r,int g,int b)1341 static stbi_uc stbi__compute_y(int r, int g, int b)
1342 {
1343    return (stbi_uc) (((r*77) + (g*150) +  (29*b)) >> 8);
1344 }
1345 
stbi__convert_format(unsigned char * data,int img_n,int req_comp,unsigned int x,unsigned int y)1346 static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
1347 {
1348    int i,j;
1349    unsigned char *good;
1350 
1351    if (req_comp == img_n) return data;
1352    STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1353 
1354    if (x == 0 || y == 0 || req_comp <= 0 || (req_comp > INT_MAX / x / y))
1355        return stbi__errpuc("Integer OverFlow", "x or y is bad");
1356 
1357    good = (unsigned char *) stbi__malloc(req_comp * x * y);
1358    if (good == NULL) {
1359       STBI_FREE(data);
1360       return stbi__errpuc("outofmem", "Out of memory");
1361    }
1362 
1363    for (j=0; j < (int) y; ++j) {
1364       unsigned char *src  = data + j * x * img_n   ;
1365       unsigned char *dest = good + j * x * req_comp;
1366 
1367       #define COMBO(a,b)  ((a)*8+(b))
1368       #define CASE(a,b)   case COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
1369       // convert source image with img_n components to one with req_comp components;
1370       // avoid switch per pixel, so use switch per scanline and massive macros
1371       switch (COMBO(img_n, req_comp)) {
1372          CASE(1,2) dest[0]=src[0], dest[1]=255; break;
1373          CASE(1,3) dest[0]=dest[1]=dest[2]=src[0]; break;
1374          CASE(1,4) dest[0]=dest[1]=dest[2]=src[0], dest[3]=255; break;
1375          CASE(2,1) dest[0]=src[0]; break;
1376          CASE(2,3) dest[0]=dest[1]=dest[2]=src[0]; break;
1377          CASE(2,4) dest[0]=dest[1]=dest[2]=src[0], dest[3]=src[1]; break;
1378          CASE(3,4) dest[0]=src[0],dest[1]=src[1],dest[2]=src[2],dest[3]=255; break;
1379          CASE(3,1) dest[0]=stbi__compute_y(src[0],src[1],src[2]); break;
1380          CASE(3,2) dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = 255; break;
1381          CASE(4,1) dest[0]=stbi__compute_y(src[0],src[1],src[2]); break;
1382          CASE(4,2) dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = src[3]; break;
1383          CASE(4,3) dest[0]=src[0],dest[1]=src[1],dest[2]=src[2]; break;
1384          default: STBI_ASSERT(0);
1385       }
1386       #undef CASE
1387    }
1388 
1389    STBI_FREE(data);
1390    return good;
1391 }
1392 
1393 #ifndef STBI_NO_LINEAR
stbi__ldr_to_hdr(stbi_uc * data,int x,int y,int comp)1394 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
1395 {
1396    int i,k,n;
1397 
1398    if (x <= 0 || y <= 0 || comp <= 0 ||
1399            (sizeof(float) > INT_MAX / x / y / comp))
1400        return stbi__errpf("Integer OverFlow", "x , y or comp is too large");
1401 
1402    float *output = (float *) stbi__malloc(x * y * comp * sizeof(float));
1403    if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
1404    // compute number of non-alpha components
1405    if (comp & 1) n = comp; else n = comp-1;
1406    for (i=0; i < x*y; ++i) {
1407       for (k=0; k < n; ++k) {
1408          output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
1409       }
1410       if (k < comp) output[i*comp + k] = data[i*comp+k]/255.0f;
1411    }
1412    STBI_FREE(data);
1413    return output;
1414 }
1415 #endif
1416 
1417 #ifndef STBI_NO_HDR
1418 #define stbi__float2int(x)   ((int) (x))
stbi__hdr_to_ldr(float * data,int x,int y,int comp)1419 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp)
1420 {
1421    int i,k,n;
1422 
1423    if (x <= 0 || y <= 0 || comp <= 0 ||
1424            (comp > INT_MAX / x / y))
1425        return stbi__errpuc("Integer OverFlow", "x or y is too large");
1426 
1427    stbi_uc *output = (stbi_uc *) stbi__malloc(x * y * comp);
1428    if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
1429    // compute number of non-alpha components
1430    if (comp & 1) n = comp; else n = comp-1;
1431    for (i=0; i < x*y; ++i) {
1432       for (k=0; k < n; ++k) {
1433          float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
1434          if (z < 0) z = 0;
1435          if (z > 255) z = 255;
1436          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1437       }
1438       if (k < comp) {
1439          float z = data[i*comp+k] * 255 + 0.5f;
1440          if (z < 0) z = 0;
1441          if (z > 255) z = 255;
1442          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1443       }
1444    }
1445    STBI_FREE(data);
1446    return output;
1447 }
1448 #endif
1449 
1450 //////////////////////////////////////////////////////////////////////////////
1451 //
1452 //  "baseline" JPEG/JFIF decoder
1453 //
1454 //    simple implementation
1455 //      - doesn't support delayed output of y-dimension
1456 //      - simple interface (only one output format: 8-bit interleaved RGB)
1457 //      - doesn't try to recover corrupt jpegs
1458 //      - doesn't allow partial loading, loading multiple at once
1459 //      - still fast on x86 (copying globals into locals doesn't help x86)
1460 //      - allocates lots of intermediate memory (full size of all components)
1461 //        - non-interleaved case requires this anyway
1462 //        - allows good upsampling (see next)
1463 //    high-quality
1464 //      - upsampled channels are bilinearly interpolated, even across blocks
1465 //      - quality integer IDCT derived from IJG's 'slow'
1466 //    performance
1467 //      - fast huffman; reasonable integer IDCT
1468 //      - some SIMD kernels for common paths on targets with SSE2/NEON
1469 //      - uses a lot of intermediate memory, could cache poorly
1470 
1471 #ifndef STBI_NO_JPEG
1472 
1473 // huffman decoding acceleration
1474 #define FAST_BITS   9  // larger handles more cases; smaller stomps less cache
1475 
1476 typedef struct
1477 {
1478    stbi_uc  fast[1 << FAST_BITS];
1479    // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
1480    stbi__uint16 code[256];
1481    stbi_uc  values[256];
1482    stbi_uc  size[257];
1483    unsigned int maxcode[18];
1484    int    delta[17];   // old 'firstsymbol' - old 'firstcode'
1485 } stbi__huffman;
1486 
1487 typedef struct
1488 {
1489    stbi__context *s;
1490    stbi__huffman huff_dc[4];
1491    stbi__huffman huff_ac[4];
1492    stbi_uc dequant[4][64];
1493    stbi__int16 fast_ac[4][1 << FAST_BITS];
1494 
1495 // sizes for components, interleaved MCUs
1496    int img_h_max, img_v_max;
1497    int img_mcu_x, img_mcu_y;
1498    int img_mcu_w, img_mcu_h;
1499 
1500 // definition of jpeg image component
1501    struct
1502    {
1503       int id;
1504       int h,v;
1505       int tq;
1506       int hd,ha;
1507       int dc_pred;
1508 
1509       int x,y,w2,h2;
1510       stbi_uc *data;
1511       void *raw_data, *raw_coeff;
1512       stbi_uc *linebuf;
1513       short   *coeff;   // progressive only
1514       int      coeff_w, coeff_h; // number of 8x8 coefficient blocks
1515    } img_comp[4];
1516 
1517    stbi__uint32   code_buffer; // jpeg entropy-coded buffer
1518    int            code_bits;   // number of valid bits
1519    unsigned char  marker;      // marker seen while filling entropy buffer
1520    int            nomore;      // flag if we saw a marker so must stop
1521 
1522    int            progressive;
1523    int            spec_start;
1524    int            spec_end;
1525    int            succ_high;
1526    int            succ_low;
1527    int            eob_run;
1528 
1529    int scan_n, order[4];
1530    int restart_interval, todo;
1531 
1532 // kernels
1533    void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
1534    void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
1535    stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
1536 } stbi__jpeg;
1537 
stbi__build_huffman(stbi__huffman * h,int * count)1538 static int stbi__build_huffman(stbi__huffman *h, int *count)
1539 {
1540    int i,j,k=0,code;
1541    // build size list for each symbol (from JPEG spec)
1542    for (i=0; i < 16; ++i)
1543       for (j=0; j < count[i]; ++j)
1544          h->size[k++] = (stbi_uc) (i+1);
1545    h->size[k] = 0;
1546 
1547    // compute actual symbols (from jpeg spec)
1548    code = 0;
1549    k = 0;
1550    for(j=1; j <= 16; ++j) {
1551       // compute delta to add to code to compute symbol id
1552       h->delta[j] = k - code;
1553       if (h->size[k] == j) {
1554          while (h->size[k] == j)
1555             h->code[k++] = (stbi__uint16) (code++);
1556          if (code-1 >= (1 << j)) return stbi__err("bad code lengths","Corrupt JPEG");
1557       }
1558       // compute largest code + 1 for this size, preshifted as needed later
1559       h->maxcode[j] = code << (16-j);
1560       code <<= 1;
1561    }
1562    h->maxcode[j] = 0xffffffff;
1563 
1564    // build non-spec acceleration table; 255 is flag for not-accelerated
1565    memset(h->fast, 255, 1 << FAST_BITS);
1566    for (i=0; i < k; ++i) {
1567       int s = h->size[i];
1568       if (s <= FAST_BITS) {
1569          int c = h->code[i] << (FAST_BITS-s);
1570          int m = 1 << (FAST_BITS-s);
1571          for (j=0; j < m; ++j) {
1572             h->fast[c+j] = (stbi_uc) i;
1573          }
1574       }
1575    }
1576    return 1;
1577 }
1578 
1579 // build a table that decodes both magnitude and value of small ACs in
1580 // one go.
stbi__build_fast_ac(stbi__int16 * fast_ac,stbi__huffman * h)1581 static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
1582 {
1583    int i;
1584    for (i=0; i < (1 << FAST_BITS); ++i) {
1585       stbi_uc fast = h->fast[i];
1586       fast_ac[i] = 0;
1587       if (fast < 255) {
1588          int rs = h->values[fast];
1589          int run = (rs >> 4) & 15;
1590          int magbits = rs & 15;
1591          int len = h->size[fast];
1592 
1593          if (magbits && len + magbits <= FAST_BITS) {
1594             // magnitude code followed by receive_extend code
1595             int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
1596             int m = 1 << (magbits - 1);
1597             if (k < m) k += (-1 << magbits) + 1;
1598             // if the result is small enough, we can fit it in fast_ac table
1599             if (k >= -128 && k <= 127)
1600                fast_ac[i] = (stbi__int16) ((k << 8) + (run << 4) + (len + magbits));
1601          }
1602       }
1603    }
1604 }
1605 
stbi__grow_buffer_unsafe(stbi__jpeg * j)1606 static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
1607 {
1608    do {
1609       int b = j->nomore ? 0 : stbi__get8(j->s);
1610       if (b == 0xff) {
1611          int c = stbi__get8(j->s);
1612          if (c != 0) {
1613             j->marker = (unsigned char) c;
1614             j->nomore = 1;
1615             return;
1616          }
1617       }
1618       j->code_buffer |= b << (24 - j->code_bits);
1619       j->code_bits += 8;
1620    } while (j->code_bits <= 24);
1621 }
1622 
1623 // (1 << n) - 1
1624 static stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
1625 
1626 // decode a jpeg huffman value from the bitstream
stbi__jpeg_huff_decode(stbi__jpeg * j,stbi__huffman * h)1627 stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
1628 {
1629    unsigned int temp;
1630    int c,k;
1631 
1632    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1633 
1634    // look at the top FAST_BITS and determine what symbol ID it is,
1635    // if the code is <= FAST_BITS
1636    c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1637    k = h->fast[c];
1638    if (k < 255) {
1639       int s = h->size[k];
1640       if (s > j->code_bits)
1641          return -1;
1642       j->code_buffer <<= s;
1643       j->code_bits -= s;
1644       return h->values[k];
1645    }
1646 
1647    // naive test is to shift the code_buffer down so k bits are
1648    // valid, then test against maxcode. To speed this up, we've
1649    // preshifted maxcode left so that it has (16-k) 0s at the
1650    // end; in other words, regardless of the number of bits, it
1651    // wants to be compared against something shifted to have 16;
1652    // that way we don't need to shift inside the loop.
1653    temp = j->code_buffer >> 16;
1654    for (k=FAST_BITS+1 ; ; ++k)
1655       if (temp < h->maxcode[k])
1656          break;
1657    if (k == 17) {
1658       // error! code not found
1659       j->code_bits -= 16;
1660       return -1;
1661    }
1662 
1663    if (k > j->code_bits)
1664       return -1;
1665 
1666    // convert the huffman code to the symbol id
1667    c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
1668    STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
1669 
1670    // convert the id to a symbol
1671    j->code_bits -= k;
1672    j->code_buffer <<= k;
1673    return h->values[c];
1674 }
1675 
1676 // bias[n] = (-1<<n) + 1
1677 static int const stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
1678 
1679 // combined JPEG 'receive' and JPEG 'extend', since baseline
1680 // always extends everything it receives.
stbi__extend_receive(stbi__jpeg * j,int n)1681 stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
1682 {
1683    unsigned int k;
1684    int sgn;
1685    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
1686 
1687    sgn = (stbi__int32)j->code_buffer >> 31; // sign bit is always in MSB
1688    k = stbi_lrot(j->code_buffer, n);
1689    STBI_ASSERT(n >= 0 && n < (int) (sizeof(stbi__bmask)/sizeof(*stbi__bmask)));
1690    j->code_buffer = k & ~stbi__bmask[n];
1691    k &= stbi__bmask[n];
1692    j->code_bits -= n;
1693    return k + (stbi__jbias[n] & ~sgn);
1694 }
1695 
1696 // get some unsigned bits
stbi__jpeg_get_bits(stbi__jpeg * j,int n)1697 stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
1698 {
1699    unsigned int k;
1700    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
1701    k = stbi_lrot(j->code_buffer, n);
1702    j->code_buffer = k & ~stbi__bmask[n];
1703    k &= stbi__bmask[n];
1704    j->code_bits -= n;
1705    return k;
1706 }
1707 
stbi__jpeg_get_bit(stbi__jpeg * j)1708 stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
1709 {
1710    unsigned int k;
1711    if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
1712    k = j->code_buffer;
1713    j->code_buffer <<= 1;
1714    --j->code_bits;
1715    return k & 0x80000000;
1716 }
1717 
1718 // given a value that's at position X in the zigzag stream,
1719 // where does it appear in the 8x8 matrix coded as row-major?
1720 static stbi_uc stbi__jpeg_dezigzag[64+15] =
1721 {
1722     0,  1,  8, 16,  9,  2,  3, 10,
1723    17, 24, 32, 25, 18, 11,  4,  5,
1724    12, 19, 26, 33, 40, 48, 41, 34,
1725    27, 20, 13,  6,  7, 14, 21, 28,
1726    35, 42, 49, 56, 57, 50, 43, 36,
1727    29, 22, 15, 23, 30, 37, 44, 51,
1728    58, 59, 52, 45, 38, 31, 39, 46,
1729    53, 60, 61, 54, 47, 55, 62, 63,
1730    // let corrupt input sample past end
1731    63, 63, 63, 63, 63, 63, 63, 63,
1732    63, 63, 63, 63, 63, 63, 63
1733 };
1734 
1735 // decode one 64-entry block--
stbi__jpeg_decode_block(stbi__jpeg * j,short data[64],stbi__huffman * hdc,stbi__huffman * hac,stbi__int16 * fac,int b,stbi_uc * dequant)1736 static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi_uc *dequant)
1737 {
1738    int diff,dc,k;
1739    int t;
1740 
1741    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1742    t = stbi__jpeg_huff_decode(j, hdc);
1743    if (t < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1744 
1745    // 0 all the ac values now so we can do it 32-bits at a time
1746    memset(data,0,64*sizeof(data[0]));
1747 
1748    diff = t ? stbi__extend_receive(j, t) : 0;
1749    dc = j->img_comp[b].dc_pred + diff;
1750    j->img_comp[b].dc_pred = dc;
1751    data[0] = (short) (dc * dequant[0]);
1752 
1753    // decode AC components, see JPEG spec
1754    k = 1;
1755    do {
1756       unsigned int zig;
1757       int c,r,s;
1758       if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1759       c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1760       r = fac[c];
1761       if (r) { // fast-AC path
1762          k += (r >> 4) & 15; // run
1763          s = r & 15; // combined length
1764          j->code_buffer <<= s;
1765          j->code_bits -= s;
1766          // decode into unzigzag'd location
1767          zig = stbi__jpeg_dezigzag[k++];
1768          data[zig] = (short) ((r >> 8) * dequant[zig]);
1769       } else {
1770          int rs = stbi__jpeg_huff_decode(j, hac);
1771          if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1772          s = rs & 15;
1773          r = rs >> 4;
1774          if (s == 0) {
1775             if (rs != 0xf0) break; // end block
1776             k += 16;
1777          } else {
1778             k += r;
1779             // decode into unzigzag'd location
1780             zig = stbi__jpeg_dezigzag[k++];
1781             data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
1782          }
1783       }
1784    } while (k < 64);
1785    return 1;
1786 }
1787 
stbi__jpeg_decode_block_prog_dc(stbi__jpeg * j,short data[64],stbi__huffman * hdc,int b)1788 static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
1789 {
1790    int diff,dc;
1791    int t;
1792    if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
1793 
1794    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1795 
1796    if (j->succ_high == 0) {
1797       // first scan for DC coefficient, must be first
1798       memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
1799       t = stbi__jpeg_huff_decode(j, hdc);
1800       diff = t ? stbi__extend_receive(j, t) : 0;
1801 
1802       dc = j->img_comp[b].dc_pred + diff;
1803       j->img_comp[b].dc_pred = dc;
1804       data[0] = (short) (dc << j->succ_low);
1805    } else {
1806       // refinement scan for DC coefficient
1807       if (stbi__jpeg_get_bit(j))
1808          data[0] += (short) (1 << j->succ_low);
1809    }
1810    return 1;
1811 }
1812 
1813 // @OPTIMIZE: store non-zigzagged during the decode passes,
1814 // and only de-zigzag when dequantizing
stbi__jpeg_decode_block_prog_ac(stbi__jpeg * j,short data[64],stbi__huffman * hac,stbi__int16 * fac)1815 static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
1816 {
1817    int k;
1818    if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
1819 
1820    if (j->succ_high == 0) {
1821       int shift = j->succ_low;
1822 
1823       if (j->eob_run) {
1824          --j->eob_run;
1825          return 1;
1826       }
1827 
1828       k = j->spec_start;
1829       do {
1830          unsigned int zig;
1831          int c,r,s;
1832          if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1833          c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1834          r = fac[c];
1835          if (r) { // fast-AC path
1836             k += (r >> 4) & 15; // run
1837             s = r & 15; // combined length
1838             j->code_buffer <<= s;
1839             j->code_bits -= s;
1840             zig = stbi__jpeg_dezigzag[k++];
1841             data[zig] = (short) ((r >> 8) << shift);
1842          } else {
1843             int rs = stbi__jpeg_huff_decode(j, hac);
1844             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1845             s = rs & 15;
1846             r = rs >> 4;
1847             if (s == 0) {
1848                if (r < 15) {
1849                   j->eob_run = (1 << r);
1850                   if (r)
1851                      j->eob_run += stbi__jpeg_get_bits(j, r);
1852                   --j->eob_run;
1853                   break;
1854                }
1855                k += 16;
1856             } else {
1857                k += r;
1858                zig = stbi__jpeg_dezigzag[k++];
1859                data[zig] = (short) (stbi__extend_receive(j,s) << shift);
1860             }
1861          }
1862       } while (k <= j->spec_end);
1863    } else {
1864       // refinement scan for these AC coefficients
1865 
1866       short bit = (short) (1 << j->succ_low);
1867 
1868       if (j->eob_run) {
1869          --j->eob_run;
1870          for (k = j->spec_start; k <= j->spec_end; ++k) {
1871             short *p = &data[stbi__jpeg_dezigzag[k]];
1872             if (*p != 0)
1873                if (stbi__jpeg_get_bit(j))
1874                   if ((*p & bit)==0) {
1875                      if (*p > 0)
1876                         *p += bit;
1877                      else
1878                         *p -= bit;
1879                   }
1880          }
1881       } else {
1882          k = j->spec_start;
1883          do {
1884             int r,s;
1885             int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
1886             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1887             s = rs & 15;
1888             r = rs >> 4;
1889             if (s == 0) {
1890                if (r < 15) {
1891                   j->eob_run = (1 << r) - 1;
1892                   if (r)
1893                      j->eob_run += stbi__jpeg_get_bits(j, r);
1894                   r = 64; // force end of block
1895                } else {
1896                   // r=15 s=0 should write 16 0s, so we just do
1897                   // a run of 15 0s and then write s (which is 0),
1898                   // so we don't have to do anything special here
1899                }
1900             } else {
1901                if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
1902                // sign bit
1903                if (stbi__jpeg_get_bit(j))
1904                   s = bit;
1905                else
1906                   s = -bit;
1907             }
1908 
1909             // advance by r
1910             while (k <= j->spec_end) {
1911                short *p = &data[stbi__jpeg_dezigzag[k++]];
1912                if (*p != 0) {
1913                   if (stbi__jpeg_get_bit(j))
1914                      if ((*p & bit)==0) {
1915                         if (*p > 0)
1916                            *p += bit;
1917                         else
1918                            *p -= bit;
1919                      }
1920                } else {
1921                   if (r == 0) {
1922                      *p = (short) s;
1923                      break;
1924                   }
1925                   --r;
1926                }
1927             }
1928          } while (k <= j->spec_end);
1929       }
1930    }
1931    return 1;
1932 }
1933 
1934 // take a -128..127 value and stbi__clamp it and convert to 0..255
stbi__clamp(int x)1935 stbi_inline static stbi_uc stbi__clamp(int x)
1936 {
1937    // trick to use a single test to catch both cases
1938    if ((unsigned int) x > 255) {
1939       if (x < 0) return 0;
1940       if (x > 255) return 255;
1941    }
1942    return (stbi_uc) x;
1943 }
1944 
1945 #define stbi__f2f(x)  ((int) (((x) * 4096 + 0.5)))
1946 #define stbi__fsh(x)  ((x) << 12)
1947 
1948 // derived from jidctint -- DCT_ISLOW
1949 #define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
1950    int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
1951    p2 = s2;                                    \
1952    p3 = s6;                                    \
1953    p1 = (p2+p3) * stbi__f2f(0.5411961f);       \
1954    t2 = p1 + p3*stbi__f2f(-1.847759065f);      \
1955    t3 = p1 + p2*stbi__f2f( 0.765366865f);      \
1956    p2 = s0;                                    \
1957    p3 = s4;                                    \
1958    t0 = stbi__fsh(p2+p3);                      \
1959    t1 = stbi__fsh(p2-p3);                      \
1960    x0 = t0+t3;                                 \
1961    x3 = t0-t3;                                 \
1962    x1 = t1+t2;                                 \
1963    x2 = t1-t2;                                 \
1964    t0 = s7;                                    \
1965    t1 = s5;                                    \
1966    t2 = s3;                                    \
1967    t3 = s1;                                    \
1968    p3 = t0+t2;                                 \
1969    p4 = t1+t3;                                 \
1970    p1 = t0+t3;                                 \
1971    p2 = t1+t2;                                 \
1972    p5 = (p3+p4)*stbi__f2f( 1.175875602f);      \
1973    t0 = t0*stbi__f2f( 0.298631336f);           \
1974    t1 = t1*stbi__f2f( 2.053119869f);           \
1975    t2 = t2*stbi__f2f( 3.072711026f);           \
1976    t3 = t3*stbi__f2f( 1.501321110f);           \
1977    p1 = p5 + p1*stbi__f2f(-0.899976223f);      \
1978    p2 = p5 + p2*stbi__f2f(-2.562915447f);      \
1979    p3 = p3*stbi__f2f(-1.961570560f);           \
1980    p4 = p4*stbi__f2f(-0.390180644f);           \
1981    t3 += p1+p4;                                \
1982    t2 += p2+p3;                                \
1983    t1 += p2+p4;                                \
1984    t0 += p1+p3;
1985 
stbi__idct_block(stbi_uc * out,int out_stride,short data[64])1986 static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
1987 {
1988    int i,val[64],*v=val;
1989    stbi_uc *o;
1990    short *d = data;
1991 
1992    // columns
1993    for (i=0; i < 8; ++i,++d, ++v) {
1994       // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
1995       if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
1996            && d[40]==0 && d[48]==0 && d[56]==0) {
1997          //    no shortcut                 0     seconds
1998          //    (1|2|3|4|5|6|7)==0          0     seconds
1999          //    all separate               -0.047 seconds
2000          //    1 && 2|3 && 4|5 && 6|7:    -0.047 seconds
2001          int dcterm = d[0] << 2;
2002          v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
2003       } else {
2004          STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
2005          // constants scaled things up by 1<<12; let's bring them back
2006          // down, but keep 2 extra bits of precision
2007          x0 += 512; x1 += 512; x2 += 512; x3 += 512;
2008          v[ 0] = (x0+t3) >> 10;
2009          v[56] = (x0-t3) >> 10;
2010          v[ 8] = (x1+t2) >> 10;
2011          v[48] = (x1-t2) >> 10;
2012          v[16] = (x2+t1) >> 10;
2013          v[40] = (x2-t1) >> 10;
2014          v[24] = (x3+t0) >> 10;
2015          v[32] = (x3-t0) >> 10;
2016       }
2017    }
2018 
2019    for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
2020       // no fast case since the first 1D IDCT spread components out
2021       STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
2022       // constants scaled things up by 1<<12, plus we had 1<<2 from first
2023       // loop, plus horizontal and vertical each scale by sqrt(8) so together
2024       // we've got an extra 1<<3, so 1<<17 total we need to remove.
2025       // so we want to round that, which means adding 0.5 * 1<<17,
2026       // aka 65536. Also, we'll end up with -128 to 127 that we want
2027       // to encode as 0..255 by adding 128, so we'll add that before the shift
2028       x0 += 65536 + (128<<17);
2029       x1 += 65536 + (128<<17);
2030       x2 += 65536 + (128<<17);
2031       x3 += 65536 + (128<<17);
2032       // tried computing the shifts into temps, or'ing the temps to see
2033       // if any were out of range, but that was slower
2034       o[0] = stbi__clamp((x0+t3) >> 17);
2035       o[7] = stbi__clamp((x0-t3) >> 17);
2036       o[1] = stbi__clamp((x1+t2) >> 17);
2037       o[6] = stbi__clamp((x1-t2) >> 17);
2038       o[2] = stbi__clamp((x2+t1) >> 17);
2039       o[5] = stbi__clamp((x2-t1) >> 17);
2040       o[3] = stbi__clamp((x3+t0) >> 17);
2041       o[4] = stbi__clamp((x3-t0) >> 17);
2042    }
2043 }
2044 
2045 #ifdef STBI_SSE2
2046 // sse2 integer IDCT. not the fastest possible implementation but it
2047 // produces bit-identical results to the generic C version so it's
2048 // fully "transparent".
stbi__idct_simd(stbi_uc * out,int out_stride,short data[64])2049 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2050 {
2051    // This is constructed to match our regular (generic) integer IDCT exactly.
2052    __m128i row0, row1, row2, row3, row4, row5, row6, row7;
2053    __m128i tmp;
2054 
2055    // dot product constant: even elems=x, odd elems=y
2056    #define dct_const(x,y)  _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
2057 
2058    // out(0) = c0[even]*x + c0[odd]*y   (c0, x, y 16-bit, out 32-bit)
2059    // out(1) = c1[even]*x + c1[odd]*y
2060    #define dct_rot(out0,out1, x,y,c0,c1) \
2061       __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
2062       __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
2063       __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
2064       __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
2065       __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
2066       __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
2067 
2068    // out = in << 12  (in 16-bit, out 32-bit)
2069    #define dct_widen(out, in) \
2070       __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
2071       __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
2072 
2073    // wide add
2074    #define dct_wadd(out, a, b) \
2075       __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
2076       __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
2077 
2078    // wide sub
2079    #define dct_wsub(out, a, b) \
2080       __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
2081       __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
2082 
2083    // butterfly a/b, add bias, then shift by "s" and pack
2084    #define dct_bfly32o(out0, out1, a,b,bias,s) \
2085       { \
2086          __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
2087          __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
2088          dct_wadd(sum, abiased, b); \
2089          dct_wsub(dif, abiased, b); \
2090          out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
2091          out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
2092       }
2093 
2094    // 8-bit interleave step (for transposes)
2095    #define dct_interleave8(a, b) \
2096       tmp = a; \
2097       a = _mm_unpacklo_epi8(a, b); \
2098       b = _mm_unpackhi_epi8(tmp, b)
2099 
2100    // 16-bit interleave step (for transposes)
2101    #define dct_interleave16(a, b) \
2102       tmp = a; \
2103       a = _mm_unpacklo_epi16(a, b); \
2104       b = _mm_unpackhi_epi16(tmp, b)
2105 
2106    #define dct_pass(bias,shift) \
2107       { \
2108          /* even part */ \
2109          dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
2110          __m128i sum04 = _mm_add_epi16(row0, row4); \
2111          __m128i dif04 = _mm_sub_epi16(row0, row4); \
2112          dct_widen(t0e, sum04); \
2113          dct_widen(t1e, dif04); \
2114          dct_wadd(x0, t0e, t3e); \
2115          dct_wsub(x3, t0e, t3e); \
2116          dct_wadd(x1, t1e, t2e); \
2117          dct_wsub(x2, t1e, t2e); \
2118          /* odd part */ \
2119          dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
2120          dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
2121          __m128i sum17 = _mm_add_epi16(row1, row7); \
2122          __m128i sum35 = _mm_add_epi16(row3, row5); \
2123          dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
2124          dct_wadd(x4, y0o, y4o); \
2125          dct_wadd(x5, y1o, y5o); \
2126          dct_wadd(x6, y2o, y5o); \
2127          dct_wadd(x7, y3o, y4o); \
2128          dct_bfly32o(row0,row7, x0,x7,bias,shift); \
2129          dct_bfly32o(row1,row6, x1,x6,bias,shift); \
2130          dct_bfly32o(row2,row5, x2,x5,bias,shift); \
2131          dct_bfly32o(row3,row4, x3,x4,bias,shift); \
2132       }
2133 
2134    __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
2135    __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
2136    __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
2137    __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
2138    __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
2139    __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
2140    __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
2141    __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
2142 
2143    // rounding biases in column/row passes, see stbi__idct_block for explanation.
2144    __m128i bias_0 = _mm_set1_epi32(512);
2145    __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
2146 
2147    // load
2148    row0 = _mm_load_si128((const __m128i *) (data + 0*8));
2149    row1 = _mm_load_si128((const __m128i *) (data + 1*8));
2150    row2 = _mm_load_si128((const __m128i *) (data + 2*8));
2151    row3 = _mm_load_si128((const __m128i *) (data + 3*8));
2152    row4 = _mm_load_si128((const __m128i *) (data + 4*8));
2153    row5 = _mm_load_si128((const __m128i *) (data + 5*8));
2154    row6 = _mm_load_si128((const __m128i *) (data + 6*8));
2155    row7 = _mm_load_si128((const __m128i *) (data + 7*8));
2156 
2157    // column pass
2158    dct_pass(bias_0, 10);
2159 
2160    {
2161       // 16bit 8x8 transpose pass 1
2162       dct_interleave16(row0, row4);
2163       dct_interleave16(row1, row5);
2164       dct_interleave16(row2, row6);
2165       dct_interleave16(row3, row7);
2166 
2167       // transpose pass 2
2168       dct_interleave16(row0, row2);
2169       dct_interleave16(row1, row3);
2170       dct_interleave16(row4, row6);
2171       dct_interleave16(row5, row7);
2172 
2173       // transpose pass 3
2174       dct_interleave16(row0, row1);
2175       dct_interleave16(row2, row3);
2176       dct_interleave16(row4, row5);
2177       dct_interleave16(row6, row7);
2178    }
2179 
2180    // row pass
2181    dct_pass(bias_1, 17);
2182 
2183    {
2184       // pack
2185       __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
2186       __m128i p1 = _mm_packus_epi16(row2, row3);
2187       __m128i p2 = _mm_packus_epi16(row4, row5);
2188       __m128i p3 = _mm_packus_epi16(row6, row7);
2189 
2190       // 8bit 8x8 transpose pass 1
2191       dct_interleave8(p0, p2); // a0e0a1e1...
2192       dct_interleave8(p1, p3); // c0g0c1g1...
2193 
2194       // transpose pass 2
2195       dct_interleave8(p0, p1); // a0c0e0g0...
2196       dct_interleave8(p2, p3); // b0d0f0h0...
2197 
2198       // transpose pass 3
2199       dct_interleave8(p0, p2); // a0b0c0d0...
2200       dct_interleave8(p1, p3); // a4b4c4d4...
2201 
2202       // store
2203       _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
2204       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
2205       _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
2206       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
2207       _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
2208       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
2209       _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
2210       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
2211    }
2212 
2213 #undef dct_const
2214 #undef dct_rot
2215 #undef dct_widen
2216 #undef dct_wadd
2217 #undef dct_wsub
2218 #undef dct_bfly32o
2219 #undef dct_interleave8
2220 #undef dct_interleave16
2221 #undef dct_pass
2222 }
2223 
2224 #endif // STBI_SSE2
2225 
2226 #ifdef STBI_NEON
2227 
2228 // NEON integer IDCT. should produce bit-identical
2229 // results to the generic C version.
stbi__idct_simd(stbi_uc * out,int out_stride,short data[64])2230 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2231 {
2232    int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
2233 
2234    int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
2235    int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
2236    int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
2237    int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
2238    int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
2239    int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
2240    int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
2241    int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
2242    int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
2243    int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
2244    int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
2245    int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
2246 
2247 #define dct_long_mul(out, inq, coeff) \
2248    int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
2249    int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
2250 
2251 #define dct_long_mac(out, acc, inq, coeff) \
2252    int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
2253    int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
2254 
2255 #define dct_widen(out, inq) \
2256    int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
2257    int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
2258 
2259 // wide add
2260 #define dct_wadd(out, a, b) \
2261    int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
2262    int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
2263 
2264 // wide sub
2265 #define dct_wsub(out, a, b) \
2266    int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
2267    int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
2268 
2269 // butterfly a/b, then shift using "shiftop" by "s" and pack
2270 #define dct_bfly32o(out0,out1, a,b,shiftop,s) \
2271    { \
2272       dct_wadd(sum, a, b); \
2273       dct_wsub(dif, a, b); \
2274       out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
2275       out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
2276    }
2277 
2278 #define dct_pass(shiftop, shift) \
2279    { \
2280       /* even part */ \
2281       int16x8_t sum26 = vaddq_s16(row2, row6); \
2282       dct_long_mul(p1e, sum26, rot0_0); \
2283       dct_long_mac(t2e, p1e, row6, rot0_1); \
2284       dct_long_mac(t3e, p1e, row2, rot0_2); \
2285       int16x8_t sum04 = vaddq_s16(row0, row4); \
2286       int16x8_t dif04 = vsubq_s16(row0, row4); \
2287       dct_widen(t0e, sum04); \
2288       dct_widen(t1e, dif04); \
2289       dct_wadd(x0, t0e, t3e); \
2290       dct_wsub(x3, t0e, t3e); \
2291       dct_wadd(x1, t1e, t2e); \
2292       dct_wsub(x2, t1e, t2e); \
2293       /* odd part */ \
2294       int16x8_t sum15 = vaddq_s16(row1, row5); \
2295       int16x8_t sum17 = vaddq_s16(row1, row7); \
2296       int16x8_t sum35 = vaddq_s16(row3, row5); \
2297       int16x8_t sum37 = vaddq_s16(row3, row7); \
2298       int16x8_t sumodd = vaddq_s16(sum17, sum35); \
2299       dct_long_mul(p5o, sumodd, rot1_0); \
2300       dct_long_mac(p1o, p5o, sum17, rot1_1); \
2301       dct_long_mac(p2o, p5o, sum35, rot1_2); \
2302       dct_long_mul(p3o, sum37, rot2_0); \
2303       dct_long_mul(p4o, sum15, rot2_1); \
2304       dct_wadd(sump13o, p1o, p3o); \
2305       dct_wadd(sump24o, p2o, p4o); \
2306       dct_wadd(sump23o, p2o, p3o); \
2307       dct_wadd(sump14o, p1o, p4o); \
2308       dct_long_mac(x4, sump13o, row7, rot3_0); \
2309       dct_long_mac(x5, sump24o, row5, rot3_1); \
2310       dct_long_mac(x6, sump23o, row3, rot3_2); \
2311       dct_long_mac(x7, sump14o, row1, rot3_3); \
2312       dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
2313       dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
2314       dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
2315       dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
2316    }
2317 
2318    // load
2319    row0 = vld1q_s16(data + 0*8);
2320    row1 = vld1q_s16(data + 1*8);
2321    row2 = vld1q_s16(data + 2*8);
2322    row3 = vld1q_s16(data + 3*8);
2323    row4 = vld1q_s16(data + 4*8);
2324    row5 = vld1q_s16(data + 5*8);
2325    row6 = vld1q_s16(data + 6*8);
2326    row7 = vld1q_s16(data + 7*8);
2327 
2328    // add DC bias
2329    row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
2330 
2331    // column pass
2332    dct_pass(vrshrn_n_s32, 10);
2333 
2334    // 16bit 8x8 transpose
2335    {
2336 // these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
2337 // whether compilers actually get this is another story, sadly.
2338 #define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
2339 #define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
2340 #define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
2341 
2342       // pass 1
2343       dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
2344       dct_trn16(row2, row3);
2345       dct_trn16(row4, row5);
2346       dct_trn16(row6, row7);
2347 
2348       // pass 2
2349       dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
2350       dct_trn32(row1, row3);
2351       dct_trn32(row4, row6);
2352       dct_trn32(row5, row7);
2353 
2354       // pass 3
2355       dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
2356       dct_trn64(row1, row5);
2357       dct_trn64(row2, row6);
2358       dct_trn64(row3, row7);
2359 
2360 #undef dct_trn16
2361 #undef dct_trn32
2362 #undef dct_trn64
2363    }
2364 
2365    // row pass
2366    // vrshrn_n_s32 only supports shifts up to 16, we need
2367    // 17. so do a non-rounding shift of 16 first then follow
2368    // up with a rounding shift by 1.
2369    dct_pass(vshrn_n_s32, 16);
2370 
2371    {
2372       // pack and round
2373       uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
2374       uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
2375       uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
2376       uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
2377       uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
2378       uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
2379       uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
2380       uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
2381 
2382       // again, these can translate into one instruction, but often don't.
2383 #define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
2384 #define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
2385 #define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
2386 
2387       // sadly can't use interleaved stores here since we only write
2388       // 8 bytes to each scan line!
2389 
2390       // 8x8 8-bit transpose pass 1
2391       dct_trn8_8(p0, p1);
2392       dct_trn8_8(p2, p3);
2393       dct_trn8_8(p4, p5);
2394       dct_trn8_8(p6, p7);
2395 
2396       // pass 2
2397       dct_trn8_16(p0, p2);
2398       dct_trn8_16(p1, p3);
2399       dct_trn8_16(p4, p6);
2400       dct_trn8_16(p5, p7);
2401 
2402       // pass 3
2403       dct_trn8_32(p0, p4);
2404       dct_trn8_32(p1, p5);
2405       dct_trn8_32(p2, p6);
2406       dct_trn8_32(p3, p7);
2407 
2408       // store
2409       vst1_u8(out, p0); out += out_stride;
2410       vst1_u8(out, p1); out += out_stride;
2411       vst1_u8(out, p2); out += out_stride;
2412       vst1_u8(out, p3); out += out_stride;
2413       vst1_u8(out, p4); out += out_stride;
2414       vst1_u8(out, p5); out += out_stride;
2415       vst1_u8(out, p6); out += out_stride;
2416       vst1_u8(out, p7);
2417 
2418 #undef dct_trn8_8
2419 #undef dct_trn8_16
2420 #undef dct_trn8_32
2421    }
2422 
2423 #undef dct_long_mul
2424 #undef dct_long_mac
2425 #undef dct_widen
2426 #undef dct_wadd
2427 #undef dct_wsub
2428 #undef dct_bfly32o
2429 #undef dct_pass
2430 }
2431 
2432 #endif // STBI_NEON
2433 
2434 #define STBI__MARKER_none  0xff
2435 // if there's a pending marker from the entropy stream, return that
2436 // otherwise, fetch from the stream and get a marker. if there's no
2437 // marker, return 0xff, which is never a valid marker value
stbi__get_marker(stbi__jpeg * j)2438 static stbi_uc stbi__get_marker(stbi__jpeg *j)
2439 {
2440    stbi_uc x;
2441    if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
2442    x = stbi__get8(j->s);
2443    if (x != 0xff) return STBI__MARKER_none;
2444    while (x == 0xff)
2445       x = stbi__get8(j->s);
2446    return x;
2447 }
2448 
2449 // in each scan, we'll have scan_n components, and the order
2450 // of the components is specified by order[]
2451 #define STBI__RESTART(x)     ((x) >= 0xd0 && (x) <= 0xd7)
2452 
2453 // after a restart interval, stbi__jpeg_reset the entropy decoder and
2454 // the dc prediction
stbi__jpeg_reset(stbi__jpeg * j)2455 static void stbi__jpeg_reset(stbi__jpeg *j)
2456 {
2457    j->code_bits = 0;
2458    j->code_buffer = 0;
2459    j->nomore = 0;
2460    j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = 0;
2461    j->marker = STBI__MARKER_none;
2462    j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
2463    j->eob_run = 0;
2464    // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
2465    // since we don't even allow 1<<30 pixels
2466 }
2467 
stbi__parse_entropy_coded_data(stbi__jpeg * z)2468 static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
2469 {
2470    stbi__jpeg_reset(z);
2471    if (!z->progressive) {
2472       if (z->scan_n == 1) {
2473          int i,j;
2474          STBI_SIMD_ALIGN(short, data[64]);
2475          int n = z->order[0];
2476          // non-interleaved data, we just need to process one block at a time,
2477          // in trivial scanline order
2478          // number of blocks to do just depends on how many actual "pixels" this
2479          // component has, independent of interleaved MCU blocking and such
2480          int w = (z->img_comp[n].x+7) >> 3;
2481          int h = (z->img_comp[n].y+7) >> 3;
2482          for (j=0; j < h; ++j) {
2483             for (i=0; i < w; ++i) {
2484                int ha = z->img_comp[n].ha;
2485                if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2486                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2487                // every data block is an MCU, so countdown the restart interval
2488                if (--z->todo <= 0) {
2489                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2490                   // if it's NOT a restart, then just bail, so we get corrupt data
2491                   // rather than no data
2492                   if (!STBI__RESTART(z->marker)) return 1;
2493                   stbi__jpeg_reset(z);
2494                }
2495             }
2496          }
2497          return 1;
2498       } else { // interleaved
2499          int i,j,k,x,y;
2500          STBI_SIMD_ALIGN(short, data[64]);
2501          for (j=0; j < z->img_mcu_y; ++j) {
2502             for (i=0; i < z->img_mcu_x; ++i) {
2503                // scan an interleaved mcu... process scan_n components in order
2504                for (k=0; k < z->scan_n; ++k) {
2505                   int n = z->order[k];
2506                   // scan out an mcu's worth of this component; that's just determined
2507                   // by the basic H and V specified for the component
2508                   for (y=0; y < z->img_comp[n].v; ++y) {
2509                      for (x=0; x < z->img_comp[n].h; ++x) {
2510                         int x2 = (i*z->img_comp[n].h + x)*8;
2511                         int y2 = (j*z->img_comp[n].v + y)*8;
2512                         int ha = z->img_comp[n].ha;
2513                         if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2514                         z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
2515                      }
2516                   }
2517                }
2518                // after all interleaved components, that's an interleaved MCU,
2519                // so now count down the restart interval
2520                if (--z->todo <= 0) {
2521                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2522                   if (!STBI__RESTART(z->marker)) return 1;
2523                   stbi__jpeg_reset(z);
2524                }
2525             }
2526          }
2527          return 1;
2528       }
2529    } else {
2530       if (z->scan_n == 1) {
2531          int i,j;
2532          int n = z->order[0];
2533          // non-interleaved data, we just need to process one block at a time,
2534          // in trivial scanline order
2535          // number of blocks to do just depends on how many actual "pixels" this
2536          // component has, independent of interleaved MCU blocking and such
2537          int w = (z->img_comp[n].x+7) >> 3;
2538          int h = (z->img_comp[n].y+7) >> 3;
2539          for (j=0; j < h; ++j) {
2540             for (i=0; i < w; ++i) {
2541                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2542                if (z->spec_start == 0) {
2543                   if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2544                      return 0;
2545                } else {
2546                   int ha = z->img_comp[n].ha;
2547                   if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
2548                      return 0;
2549                }
2550                // every data block is an MCU, so countdown the restart interval
2551                if (--z->todo <= 0) {
2552                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2553                   if (!STBI__RESTART(z->marker)) return 1;
2554                   stbi__jpeg_reset(z);
2555                }
2556             }
2557          }
2558          return 1;
2559       } else { // interleaved
2560          int i,j,k,x,y;
2561          for (j=0; j < z->img_mcu_y; ++j) {
2562             for (i=0; i < z->img_mcu_x; ++i) {
2563                // scan an interleaved mcu... process scan_n components in order
2564                for (k=0; k < z->scan_n; ++k) {
2565                   int n = z->order[k];
2566                   // scan out an mcu's worth of this component; that's just determined
2567                   // by the basic H and V specified for the component
2568                   for (y=0; y < z->img_comp[n].v; ++y) {
2569                      for (x=0; x < z->img_comp[n].h; ++x) {
2570                         int x2 = (i*z->img_comp[n].h + x);
2571                         int y2 = (j*z->img_comp[n].v + y);
2572                         short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
2573                         if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2574                            return 0;
2575                      }
2576                   }
2577                }
2578                // after all interleaved components, that's an interleaved MCU,
2579                // so now count down the restart interval
2580                if (--z->todo <= 0) {
2581                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2582                   if (!STBI__RESTART(z->marker)) return 1;
2583                   stbi__jpeg_reset(z);
2584                }
2585             }
2586          }
2587          return 1;
2588       }
2589    }
2590 }
2591 
stbi__jpeg_dequantize(short * data,stbi_uc * dequant)2592 static void stbi__jpeg_dequantize(short *data, stbi_uc *dequant)
2593 {
2594    int i;
2595    for (i=0; i < 64; ++i)
2596       data[i] *= dequant[i];
2597 }
2598 
stbi__jpeg_finish(stbi__jpeg * z)2599 static void stbi__jpeg_finish(stbi__jpeg *z)
2600 {
2601    if (z->progressive) {
2602       // dequantize and idct the data
2603       int i,j,n;
2604       for (n=0; n < z->s->img_n; ++n) {
2605          int w = (z->img_comp[n].x+7) >> 3;
2606          int h = (z->img_comp[n].y+7) >> 3;
2607          for (j=0; j < h; ++j) {
2608             for (i=0; i < w; ++i) {
2609                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2610                stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
2611                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2612             }
2613          }
2614       }
2615    }
2616 }
2617 
stbi__process_marker(stbi__jpeg * z,int m)2618 static int stbi__process_marker(stbi__jpeg *z, int m)
2619 {
2620    int L;
2621    switch (m) {
2622       case STBI__MARKER_none: // no marker found
2623          return stbi__err("expected marker","Corrupt JPEG");
2624 
2625       case 0xDD: // DRI - specify restart interval
2626          if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
2627          z->restart_interval = stbi__get16be(z->s);
2628          return 1;
2629 
2630       case 0xDB: // DQT - define quantization table
2631          L = stbi__get16be(z->s)-2;
2632          while (L > 0) {
2633             int q = stbi__get8(z->s);
2634             int p = q >> 4;
2635             int t = q & 15,i;
2636             if (p != 0) return stbi__err("bad DQT type","Corrupt JPEG");
2637             if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
2638             for (i=0; i < 64; ++i)
2639                z->dequant[t][stbi__jpeg_dezigzag[i]] = stbi__get8(z->s);
2640             L -= 65;
2641          }
2642          return L==0;
2643 
2644       case 0xC4: // DHT - define huffman table
2645          L = stbi__get16be(z->s)-2;
2646          while (L > 0) {
2647             stbi_uc *v;
2648             int sizes[16],i,n=0;
2649             int q = stbi__get8(z->s);
2650             int tc = q >> 4;
2651             int th = q & 15;
2652             if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
2653             for (i=0; i < 16; ++i) {
2654                sizes[i] = stbi__get8(z->s);
2655                n += sizes[i];
2656             }
2657             L -= 17;
2658             if (tc == 0) {
2659                if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
2660                v = z->huff_dc[th].values;
2661             } else {
2662                if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
2663                v = z->huff_ac[th].values;
2664             }
2665             for (i=0; i < n; ++i)
2666                v[i] = stbi__get8(z->s);
2667             if (tc != 0)
2668                stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
2669             L -= n;
2670          }
2671          return L==0;
2672    }
2673    // check for comment block or APP blocks
2674    if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
2675       stbi__skip(z->s, stbi__get16be(z->s)-2);
2676       return 1;
2677    }
2678    return 0;
2679 }
2680 
2681 // after we see SOS
stbi__process_scan_header(stbi__jpeg * z)2682 static int stbi__process_scan_header(stbi__jpeg *z)
2683 {
2684    int i;
2685    int Ls = stbi__get16be(z->s);
2686    z->scan_n = stbi__get8(z->s);
2687    if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
2688    if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
2689    for (i=0; i < z->scan_n; ++i) {
2690       int id = stbi__get8(z->s), which;
2691       int q = stbi__get8(z->s);
2692       for (which = 0; which < z->s->img_n; ++which)
2693          if (z->img_comp[which].id == id)
2694             break;
2695       if (which == z->s->img_n) return 0; // no match
2696       z->img_comp[which].hd = q >> 4;   if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
2697       z->img_comp[which].ha = q & 15;   if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
2698       z->order[i] = which;
2699    }
2700 
2701    {
2702       int aa;
2703       z->spec_start = stbi__get8(z->s);
2704       z->spec_end   = stbi__get8(z->s); // should be 63, but might be 0
2705       aa = stbi__get8(z->s);
2706       z->succ_high = (aa >> 4);
2707       z->succ_low  = (aa & 15);
2708       if (z->progressive) {
2709          if (z->spec_start > 63 || z->spec_end > 63  || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
2710             return stbi__err("bad SOS", "Corrupt JPEG");
2711       } else {
2712          if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
2713          if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
2714          z->spec_end = 63;
2715       }
2716    }
2717 
2718    return 1;
2719 }
2720 
stbi__process_frame_header(stbi__jpeg * z,int scan)2721 static int stbi__process_frame_header(stbi__jpeg *z, int scan)
2722 {
2723    stbi__context *s = z->s;
2724    int Lf,p,i,q, h_max=1,v_max=1,c;
2725    Lf = stbi__get16be(s);         if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
2726    p  = stbi__get8(s);            if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
2727    s->img_y = stbi__get16be(s);   if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
2728    s->img_x = stbi__get16be(s);   if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
2729    c = stbi__get8(s);
2730    if (c != 3 && c != 1) return stbi__err("bad component count","Corrupt JPEG");    // JFIF requires
2731    s->img_n = c;
2732    for (i=0; i < c; ++i) {
2733       z->img_comp[i].data = NULL;
2734       z->img_comp[i].linebuf = NULL;
2735    }
2736 
2737    if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
2738 
2739    for (i=0; i < s->img_n; ++i) {
2740       z->img_comp[i].id = stbi__get8(s);
2741       if (z->img_comp[i].id != i+1)   // JFIF requires
2742          if (z->img_comp[i].id != i)  // some version of jpegtran outputs non-JFIF-compliant files!
2743             return stbi__err("bad component ID","Corrupt JPEG");
2744       q = stbi__get8(s);
2745       z->img_comp[i].h = (q >> 4);  if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
2746       z->img_comp[i].v = q & 15;    if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
2747       z->img_comp[i].tq = stbi__get8(s);  if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
2748    }
2749 
2750    if (scan != STBI__SCAN_load) return 1;
2751 
2752    if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
2753 
2754    for (i=0; i < s->img_n; ++i) {
2755       if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
2756       if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
2757    }
2758 
2759    // compute interleaved mcu info
2760    z->img_h_max = h_max;
2761    z->img_v_max = v_max;
2762    z->img_mcu_w = h_max * 8;
2763    z->img_mcu_h = v_max * 8;
2764    z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
2765    z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
2766 
2767    for (i=0; i < s->img_n; ++i) {
2768       // number of effective pixels (e.g. for non-interleaved MCU)
2769       z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
2770       z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
2771       // to simplify generation, we'll allocate enough memory to decode
2772       // the bogus oversized data from using interleaved MCUs and their
2773       // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
2774       // discard the extra data until colorspace conversion
2775       z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
2776       z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
2777       if (z->img_comp[i].w2 <= 0 || z->img_comp[i].h2 <= 0 ||
2778               (z->img_comp[i].w2 > (INT_MAX - 15) / z->img_comp[i].h2))
2779           return stbi__err("Integer Overflow", "w2 or h2 incorrect");
2780       z->img_comp[i].raw_data = stbi__malloc(z->img_comp[i].w2 * z->img_comp[i].h2+15);
2781 
2782       if (z->img_comp[i].raw_data == NULL) {
2783          for(--i; i >= 0; --i) {
2784             STBI_FREE(z->img_comp[i].raw_data);
2785             z->img_comp[i].raw_data = NULL;
2786          }
2787          return stbi__err("outofmem", "Out of memory");
2788       }
2789       // align blocks for idct using mmx/sse
2790       z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
2791       z->img_comp[i].linebuf = NULL;
2792       if (z->progressive) {
2793          z->img_comp[i].coeff_w = (z->img_comp[i].w2 + 7) >> 3;
2794          z->img_comp[i].coeff_h = (z->img_comp[i].h2 + 7) >> 3;
2795          z->img_comp[i].raw_coeff = STBI_MALLOC(z->img_comp[i].coeff_w * z->img_comp[i].coeff_h * 64 * sizeof(short) + 15);
2796          z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
2797       } else {
2798          z->img_comp[i].coeff = 0;
2799          z->img_comp[i].raw_coeff = 0;
2800       }
2801    }
2802 
2803    return 1;
2804 }
2805 
2806 // use comparisons since in some cases we handle more than one case (e.g. SOF)
2807 #define stbi__DNL(x)         ((x) == 0xdc)
2808 #define stbi__SOI(x)         ((x) == 0xd8)
2809 #define stbi__EOI(x)         ((x) == 0xd9)
2810 #define stbi__SOF(x)         ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
2811 #define stbi__SOS(x)         ((x) == 0xda)
2812 
2813 #define stbi__SOF_progressive(x)   ((x) == 0xc2)
2814 
stbi__decode_jpeg_header(stbi__jpeg * z,int scan)2815 static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
2816 {
2817    int m;
2818    z->marker = STBI__MARKER_none; // initialize cached marker to empty
2819    m = stbi__get_marker(z);
2820    if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
2821    if (scan == STBI__SCAN_type) return 1;
2822    m = stbi__get_marker(z);
2823    while (!stbi__SOF(m)) {
2824       if (!stbi__process_marker(z,m)) return 0;
2825       m = stbi__get_marker(z);
2826       while (m == STBI__MARKER_none) {
2827          // some files have extra padding after their blocks, so ok, we'll scan
2828          if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
2829          m = stbi__get_marker(z);
2830       }
2831    }
2832    z->progressive = stbi__SOF_progressive(m);
2833    if (!stbi__process_frame_header(z, scan)) return 0;
2834    return 1;
2835 }
2836 
2837 // decode image to YCbCr format
stbi__decode_jpeg_image(stbi__jpeg * j)2838 static int stbi__decode_jpeg_image(stbi__jpeg *j)
2839 {
2840    int m;
2841    for (m = 0; m < 4; m++) {
2842       j->img_comp[m].raw_data = NULL;
2843       j->img_comp[m].raw_coeff = NULL;
2844    }
2845    j->restart_interval = 0;
2846    if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
2847    m = stbi__get_marker(j);
2848    while (!stbi__EOI(m)) {
2849       if (stbi__SOS(m)) {
2850          if (!stbi__process_scan_header(j)) return 0;
2851          if (!stbi__parse_entropy_coded_data(j)) return 0;
2852          if (j->marker == STBI__MARKER_none ) {
2853             // handle 0s at the end of image data from IP Kamera 9060
2854             while (!stbi__at_eof(j->s)) {
2855                int x = stbi__get8(j->s);
2856                if (x == 255) {
2857                   j->marker = stbi__get8(j->s);
2858                   break;
2859                } else if (x != 0) {
2860                   return stbi__err("junk before marker", "Corrupt JPEG");
2861                }
2862             }
2863             // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
2864          }
2865       } else {
2866          if (!stbi__process_marker(j, m)) return 0;
2867       }
2868       m = stbi__get_marker(j);
2869    }
2870    if (j->progressive)
2871       stbi__jpeg_finish(j);
2872    return 1;
2873 }
2874 
2875 // static jfif-centered resampling (across block boundaries)
2876 
2877 typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
2878                                     int w, int hs);
2879 
2880 #define stbi__div4(x) ((stbi_uc) ((x) >> 2))
2881 
resample_row_1(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2882 static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2883 {
2884    STBI_NOTUSED(out);
2885    STBI_NOTUSED(in_far);
2886    STBI_NOTUSED(w);
2887    STBI_NOTUSED(hs);
2888    return in_near;
2889 }
2890 
stbi__resample_row_v_2(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2891 static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2892 {
2893    // need to generate two samples vertically for every one in input
2894    int i;
2895    STBI_NOTUSED(hs);
2896    for (i=0; i < w; ++i)
2897       out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
2898    return out;
2899 }
2900 
stbi__resample_row_h_2(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2901 static stbi_uc*  stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2902 {
2903    // need to generate two samples horizontally for every one in input
2904    int i;
2905    stbi_uc *input = in_near;
2906 
2907    if (w == 1) {
2908       // if only one sample, can't do any interpolation
2909       out[0] = out[1] = input[0];
2910       return out;
2911    }
2912 
2913    out[0] = input[0];
2914    out[1] = stbi__div4(input[0]*3 + input[1] + 2);
2915    for (i=1; i < w-1; ++i) {
2916       int n = 3*input[i]+2;
2917       out[i*2+0] = stbi__div4(n+input[i-1]);
2918       out[i*2+1] = stbi__div4(n+input[i+1]);
2919    }
2920    out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
2921    out[i*2+1] = input[w-1];
2922 
2923    STBI_NOTUSED(in_far);
2924    STBI_NOTUSED(hs);
2925 
2926    return out;
2927 }
2928 
2929 #define stbi__div16(x) ((stbi_uc) ((x) >> 4))
2930 
stbi__resample_row_hv_2(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2931 static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2932 {
2933    // need to generate 2x2 samples for every one in input
2934    int i,t0,t1;
2935    if (w == 1) {
2936       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
2937       return out;
2938    }
2939 
2940    t1 = 3*in_near[0] + in_far[0];
2941    out[0] = stbi__div4(t1+2);
2942    for (i=1; i < w; ++i) {
2943       t0 = t1;
2944       t1 = 3*in_near[i]+in_far[i];
2945       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
2946       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
2947    }
2948    out[w*2-1] = stbi__div4(t1+2);
2949 
2950    STBI_NOTUSED(hs);
2951 
2952    return out;
2953 }
2954 
2955 #if defined(STBI_SSE2) || defined(STBI_NEON)
stbi__resample_row_hv_2_simd(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2956 static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2957 {
2958    // need to generate 2x2 samples for every one in input
2959    int i=0,t0,t1;
2960 
2961    if (w == 1) {
2962       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
2963       return out;
2964    }
2965 
2966    t1 = 3*in_near[0] + in_far[0];
2967    // process groups of 8 pixels for as long as we can.
2968    // note we can't handle the last pixel in a row in this loop
2969    // because we need to handle the filter boundary conditions.
2970    for (; i < ((w-1) & ~7); i += 8) {
2971 #if defined(STBI_SSE2)
2972       // load and perform the vertical filtering pass
2973       // this uses 3*x + y = 4*x + (y - x)
2974       __m128i zero  = _mm_setzero_si128();
2975       __m128i farb  = _mm_loadl_epi64((__m128i *) (in_far + i));
2976       __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
2977       __m128i farw  = _mm_unpacklo_epi8(farb, zero);
2978       __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
2979       __m128i diff  = _mm_sub_epi16(farw, nearw);
2980       __m128i nears = _mm_slli_epi16(nearw, 2);
2981       __m128i curr  = _mm_add_epi16(nears, diff); // current row
2982 
2983       // horizontal filter works the same based on shifted vers of current
2984       // row. "prev" is current row shifted right by 1 pixel; we need to
2985       // insert the previous pixel value (from t1).
2986       // "next" is current row shifted left by 1 pixel, with first pixel
2987       // of next block of 8 pixels added in.
2988       __m128i prv0 = _mm_slli_si128(curr, 2);
2989       __m128i nxt0 = _mm_srli_si128(curr, 2);
2990       __m128i prev = _mm_insert_epi16(prv0, t1, 0);
2991       __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
2992 
2993       // horizontal filter, polyphase implementation since it's convenient:
2994       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
2995       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
2996       // note the shared term.
2997       __m128i bias  = _mm_set1_epi16(8);
2998       __m128i curs = _mm_slli_epi16(curr, 2);
2999       __m128i prvd = _mm_sub_epi16(prev, curr);
3000       __m128i nxtd = _mm_sub_epi16(next, curr);
3001       __m128i curb = _mm_add_epi16(curs, bias);
3002       __m128i even = _mm_add_epi16(prvd, curb);
3003       __m128i odd  = _mm_add_epi16(nxtd, curb);
3004 
3005       // interleave even and odd pixels, then undo scaling.
3006       __m128i int0 = _mm_unpacklo_epi16(even, odd);
3007       __m128i int1 = _mm_unpackhi_epi16(even, odd);
3008       __m128i de0  = _mm_srli_epi16(int0, 4);
3009       __m128i de1  = _mm_srli_epi16(int1, 4);
3010 
3011       // pack and write output
3012       __m128i outv = _mm_packus_epi16(de0, de1);
3013       _mm_storeu_si128((__m128i *) (out + i*2), outv);
3014 #elif defined(STBI_NEON)
3015       // load and perform the vertical filtering pass
3016       // this uses 3*x + y = 4*x + (y - x)
3017       uint8x8_t farb  = vld1_u8(in_far + i);
3018       uint8x8_t nearb = vld1_u8(in_near + i);
3019       int16x8_t diff  = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
3020       int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
3021       int16x8_t curr  = vaddq_s16(nears, diff); // current row
3022 
3023       // horizontal filter works the same based on shifted vers of current
3024       // row. "prev" is current row shifted right by 1 pixel; we need to
3025       // insert the previous pixel value (from t1).
3026       // "next" is current row shifted left by 1 pixel, with first pixel
3027       // of next block of 8 pixels added in.
3028       int16x8_t prv0 = vextq_s16(curr, curr, 7);
3029       int16x8_t nxt0 = vextq_s16(curr, curr, 1);
3030       int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
3031       int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
3032 
3033       // horizontal filter, polyphase implementation since it's convenient:
3034       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3035       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
3036       // note the shared term.
3037       int16x8_t curs = vshlq_n_s16(curr, 2);
3038       int16x8_t prvd = vsubq_s16(prev, curr);
3039       int16x8_t nxtd = vsubq_s16(next, curr);
3040       int16x8_t even = vaddq_s16(curs, prvd);
3041       int16x8_t odd  = vaddq_s16(curs, nxtd);
3042 
3043       // undo scaling and round, then store with even/odd phases interleaved
3044       uint8x8x2_t o;
3045       o.val[0] = vqrshrun_n_s16(even, 4);
3046       o.val[1] = vqrshrun_n_s16(odd,  4);
3047       vst2_u8(out + i*2, o);
3048 #endif
3049 
3050       // "previous" value for next iter
3051       t1 = 3*in_near[i+7] + in_far[i+7];
3052    }
3053 
3054    t0 = t1;
3055    t1 = 3*in_near[i] + in_far[i];
3056    out[i*2] = stbi__div16(3*t1 + t0 + 8);
3057 
3058    for (++i; i < w; ++i) {
3059       t0 = t1;
3060       t1 = 3*in_near[i]+in_far[i];
3061       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
3062       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
3063    }
3064    out[w*2-1] = stbi__div4(t1+2);
3065 
3066    STBI_NOTUSED(hs);
3067 
3068    return out;
3069 }
3070 #endif
3071 
stbi__resample_row_generic(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)3072 static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3073 {
3074    // resample with nearest-neighbor
3075    int i,j;
3076    STBI_NOTUSED(in_far);
3077    for (i=0; i < w; ++i)
3078       for (j=0; j < hs; ++j)
3079          out[i*hs+j] = in_near[i];
3080    return out;
3081 }
3082 
3083 #ifdef STBI_JPEG_OLD
3084 // this is the same YCbCr-to-RGB calculation that stb_image has used
3085 // historically before the algorithm changes in 1.49
3086 #define float2fixed(x)  ((int) ((x) * 65536 + 0.5))
stbi__YCbCr_to_RGB_row(stbi_uc * out,const stbi_uc * y,const stbi_uc * pcb,const stbi_uc * pcr,int count,int step)3087 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
3088 {
3089    int i;
3090    for (i=0; i < count; ++i) {
3091       int y_fixed = (y[i] << 16) + 32768; // rounding
3092       int r,g,b;
3093       int cr = pcr[i] - 128;
3094       int cb = pcb[i] - 128;
3095       r = y_fixed + cr*float2fixed(1.40200f);
3096       g = y_fixed - cr*float2fixed(0.71414f) - cb*float2fixed(0.34414f);
3097       b = y_fixed                            + cb*float2fixed(1.77200f);
3098       r >>= 16;
3099       g >>= 16;
3100       b >>= 16;
3101       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3102       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3103       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3104       out[0] = (stbi_uc)r;
3105       out[1] = (stbi_uc)g;
3106       out[2] = (stbi_uc)b;
3107       out[3] = 255;
3108       out += step;
3109    }
3110 }
3111 #else
3112 // this is a reduced-precision calculation of YCbCr-to-RGB introduced
3113 // to make sure the code produces the same results in both SIMD and scalar
3114 #define float2fixed(x)  (((int) ((x) * 4096.0f + 0.5f)) << 8)
stbi__YCbCr_to_RGB_row(stbi_uc * out,const stbi_uc * y,const stbi_uc * pcb,const stbi_uc * pcr,int count,int step)3115 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
3116 {
3117    int i;
3118    for (i=0; i < count; ++i) {
3119       int y_fixed = (y[i] << 20) + (1<<19); // rounding
3120       int r,g,b;
3121       int cr = pcr[i] - 128;
3122       int cb = pcb[i] - 128;
3123       r = y_fixed +  cr* float2fixed(1.40200f);
3124       g = y_fixed + (cr*-float2fixed(0.71414f)) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
3125       b = y_fixed                               +   cb* float2fixed(1.77200f);
3126       r >>= 20;
3127       g >>= 20;
3128       b >>= 20;
3129       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3130       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3131       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3132       out[0] = (stbi_uc)r;
3133       out[1] = (stbi_uc)g;
3134       out[2] = (stbi_uc)b;
3135       out[3] = 255;
3136       out += step;
3137    }
3138 }
3139 #endif
3140 
3141 #if defined(STBI_SSE2) || defined(STBI_NEON)
stbi__YCbCr_to_RGB_simd(stbi_uc * out,stbi_uc const * y,stbi_uc const * pcb,stbi_uc const * pcr,int count,int step)3142 static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
3143 {
3144    int i = 0;
3145 
3146 #ifdef STBI_SSE2
3147    // step == 3 is pretty ugly on the final interleave, and i'm not convinced
3148    // it's useful in practice (you wouldn't use it for textures, for example).
3149    // so just accelerate step == 4 case.
3150    if (step == 4) {
3151       // this is a fairly straightforward implementation and not super-optimized.
3152       __m128i signflip  = _mm_set1_epi8(-0x80);
3153       __m128i cr_const0 = _mm_set1_epi16(   (short) ( 1.40200f*4096.0f+0.5f));
3154       __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
3155       __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
3156       __m128i cb_const1 = _mm_set1_epi16(   (short) ( 1.77200f*4096.0f+0.5f));
3157       __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
3158       __m128i xw = _mm_set1_epi16(255); // alpha channel
3159 
3160       for (; i+7 < count; i += 8) {
3161          // load
3162          __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
3163          __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
3164          __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
3165          __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
3166          __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
3167 
3168          // unpack to short (and left-shift cr, cb by 8)
3169          __m128i yw  = _mm_unpacklo_epi8(y_bias, y_bytes);
3170          __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
3171          __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
3172 
3173          // color transform
3174          __m128i yws = _mm_srli_epi16(yw, 4);
3175          __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
3176          __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
3177          __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
3178          __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
3179          __m128i rws = _mm_add_epi16(cr0, yws);
3180          __m128i gwt = _mm_add_epi16(cb0, yws);
3181          __m128i bws = _mm_add_epi16(yws, cb1);
3182          __m128i gws = _mm_add_epi16(gwt, cr1);
3183 
3184          // descale
3185          __m128i rw = _mm_srai_epi16(rws, 4);
3186          __m128i bw = _mm_srai_epi16(bws, 4);
3187          __m128i gw = _mm_srai_epi16(gws, 4);
3188 
3189          // back to byte, set up for transpose
3190          __m128i brb = _mm_packus_epi16(rw, bw);
3191          __m128i gxb = _mm_packus_epi16(gw, xw);
3192 
3193          // transpose to interleave channels
3194          __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
3195          __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
3196          __m128i o0 = _mm_unpacklo_epi16(t0, t1);
3197          __m128i o1 = _mm_unpackhi_epi16(t0, t1);
3198 
3199          // store
3200          _mm_storeu_si128((__m128i *) (out + 0), o0);
3201          _mm_storeu_si128((__m128i *) (out + 16), o1);
3202          out += 32;
3203       }
3204    }
3205 #endif
3206 
3207 #ifdef STBI_NEON
3208    // in this version, step=3 support would be easy to add. but is there demand?
3209    if (step == 4) {
3210       // this is a fairly straightforward implementation and not super-optimized.
3211       uint8x8_t signflip = vdup_n_u8(0x80);
3212       int16x8_t cr_const0 = vdupq_n_s16(   (short) ( 1.40200f*4096.0f+0.5f));
3213       int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
3214       int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
3215       int16x8_t cb_const1 = vdupq_n_s16(   (short) ( 1.77200f*4096.0f+0.5f));
3216 
3217       for (; i+7 < count; i += 8) {
3218          // load
3219          uint8x8_t y_bytes  = vld1_u8(y + i);
3220          uint8x8_t cr_bytes = vld1_u8(pcr + i);
3221          uint8x8_t cb_bytes = vld1_u8(pcb + i);
3222          int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
3223          int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
3224 
3225          // expand to s16
3226          int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
3227          int16x8_t crw = vshll_n_s8(cr_biased, 7);
3228          int16x8_t cbw = vshll_n_s8(cb_biased, 7);
3229 
3230          // color transform
3231          int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
3232          int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
3233          int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
3234          int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
3235          int16x8_t rws = vaddq_s16(yws, cr0);
3236          int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
3237          int16x8_t bws = vaddq_s16(yws, cb1);
3238 
3239          // undo scaling, round, convert to byte
3240          uint8x8x4_t o;
3241          o.val[0] = vqrshrun_n_s16(rws, 4);
3242          o.val[1] = vqrshrun_n_s16(gws, 4);
3243          o.val[2] = vqrshrun_n_s16(bws, 4);
3244          o.val[3] = vdup_n_u8(255);
3245 
3246          // store, interleaving r/g/b/a
3247          vst4_u8(out, o);
3248          out += 8*4;
3249       }
3250    }
3251 #endif
3252 
3253    for (; i < count; ++i) {
3254       int y_fixed = (y[i] << 20) + (1<<19); // rounding
3255       int r,g,b;
3256       int cr = pcr[i] - 128;
3257       int cb = pcb[i] - 128;
3258       r = y_fixed + cr* float2fixed(1.40200f);
3259       g = y_fixed + cr*-float2fixed(0.71414f) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
3260       b = y_fixed                             +   cb* float2fixed(1.77200f);
3261       r >>= 20;
3262       g >>= 20;
3263       b >>= 20;
3264       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3265       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3266       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3267       out[0] = (stbi_uc)r;
3268       out[1] = (stbi_uc)g;
3269       out[2] = (stbi_uc)b;
3270       out[3] = 255;
3271       out += step;
3272    }
3273 }
3274 #endif
3275 
3276 // set up the kernels
stbi__setup_jpeg(stbi__jpeg * j)3277 static void stbi__setup_jpeg(stbi__jpeg *j)
3278 {
3279    j->idct_block_kernel = stbi__idct_block;
3280    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
3281    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
3282 
3283 #ifdef STBI_SSE2
3284    if (stbi__sse2_available()) {
3285       j->idct_block_kernel = stbi__idct_simd;
3286       #ifndef STBI_JPEG_OLD
3287       j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3288       #endif
3289       j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3290    }
3291 #endif
3292 
3293 #ifdef STBI_NEON
3294    j->idct_block_kernel = stbi__idct_simd;
3295    #ifndef STBI_JPEG_OLD
3296    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3297    #endif
3298    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3299 #endif
3300 }
3301 
3302 // clean up the temporary component buffers
stbi__cleanup_jpeg(stbi__jpeg * j)3303 static void stbi__cleanup_jpeg(stbi__jpeg *j)
3304 {
3305    int i;
3306    for (i=0; i < j->s->img_n; ++i) {
3307       if (j->img_comp[i].raw_data) {
3308          STBI_FREE(j->img_comp[i].raw_data);
3309          j->img_comp[i].raw_data = NULL;
3310          j->img_comp[i].data = NULL;
3311       }
3312       if (j->img_comp[i].raw_coeff) {
3313          STBI_FREE(j->img_comp[i].raw_coeff);
3314          j->img_comp[i].raw_coeff = 0;
3315          j->img_comp[i].coeff = 0;
3316       }
3317       if (j->img_comp[i].linebuf) {
3318          STBI_FREE(j->img_comp[i].linebuf);
3319          j->img_comp[i].linebuf = NULL;
3320       }
3321    }
3322 }
3323 
3324 typedef struct
3325 {
3326    resample_row_func resample;
3327    stbi_uc *line0,*line1;
3328    int hs,vs;   // expansion factor in each axis
3329    int w_lores; // horizontal pixels pre-expansion
3330    int ystep;   // how far through vertical expansion we are
3331    int ypos;    // which pre-expansion row we're on
3332 } stbi__resample;
3333 
load_jpeg_image(stbi__jpeg * z,int * out_x,int * out_y,int * comp,int req_comp)3334 static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
3335 {
3336    int n, decode_n;
3337    z->s->img_n = 0; // make stbi__cleanup_jpeg safe
3338 
3339    // validate req_comp
3340    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
3341 
3342    // load a jpeg image from whichever source, but leave in YCbCr format
3343    if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
3344 
3345    // determine actual number of components to generate
3346    n = req_comp ? req_comp : z->s->img_n;
3347 
3348    if (z->s->img_n == 3 && n < 3)
3349       decode_n = 1;
3350    else
3351       decode_n = z->s->img_n;
3352 
3353    // resample and color-convert
3354    {
3355       int k;
3356       unsigned int i,j;
3357       stbi_uc *output;
3358       stbi_uc *coutput[4];
3359 
3360       stbi__resample res_comp[4];
3361 
3362       for (k=0; k < decode_n; ++k) {
3363          stbi__resample *r = &res_comp[k];
3364 
3365          // allocate line buffer big enough for upsampling off the edges
3366          // with upsample factor of 4
3367          if (z->s->img_x > (INT_MAX - 3))
3368              return stbi__errpuc("Integer Overflow", "z->s->img_x incorrect");
3369          z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
3370          if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3371 
3372          r->hs      = z->img_h_max / z->img_comp[k].h;
3373          r->vs      = z->img_v_max / z->img_comp[k].v;
3374          r->ystep   = r->vs >> 1;
3375          r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
3376          r->ypos    = 0;
3377          r->line0   = r->line1 = z->img_comp[k].data;
3378 
3379          if      (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
3380          else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
3381          else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
3382          else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
3383          else                               r->resample = stbi__resample_row_generic;
3384       }
3385 
3386       // can't error after this so, this is safe
3387       if(n <= 0 || z->s->img_x <= 0 || z->s->img_y <= 0 ||
3388               (z->s->img_y > (INT_MAX - 1) / z->s->img_x / n))
3389           return stbi__errpuc("Integer Overflow", "z->s->img_x or z->s->img_y incorrect");
3390       output = (stbi_uc *) stbi__malloc(n * z->s->img_x * z->s->img_y + 1);
3391       if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3392 
3393       // now go ahead and resample
3394       for (j=0; j < z->s->img_y; ++j) {
3395          stbi_uc *out = output + n * z->s->img_x * j;
3396          for (k=0; k < decode_n; ++k) {
3397             stbi__resample *r = &res_comp[k];
3398             int y_bot = r->ystep >= (r->vs >> 1);
3399             coutput[k] = r->resample(z->img_comp[k].linebuf,
3400                                      y_bot ? r->line1 : r->line0,
3401                                      y_bot ? r->line0 : r->line1,
3402                                      r->w_lores, r->hs);
3403             if (++r->ystep >= r->vs) {
3404                r->ystep = 0;
3405                r->line0 = r->line1;
3406                if (++r->ypos < z->img_comp[k].y)
3407                   r->line1 += z->img_comp[k].w2;
3408             }
3409          }
3410          if (n >= 3) {
3411             stbi_uc *y = coutput[0];
3412             if (z->s->img_n == 3) {
3413                z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3414             } else
3415                for (i=0; i < z->s->img_x; ++i) {
3416                   out[0] = out[1] = out[2] = y[i];
3417                   out[3] = 255; // not used if n==3
3418                   out += n;
3419                }
3420          } else {
3421             stbi_uc *y = coutput[0];
3422             if (n == 1)
3423                for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
3424             else
3425                for (i=0; i < z->s->img_x; ++i) *out++ = y[i], *out++ = 255;
3426          }
3427       }
3428       stbi__cleanup_jpeg(z);
3429       *out_x = z->s->img_x;
3430       *out_y = z->s->img_y;
3431       if (comp) *comp  = z->s->img_n; // report original components, not output
3432       return output;
3433    }
3434 }
3435 
stbi__jpeg_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)3436 static unsigned char *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
3437 {
3438    stbi__jpeg j;
3439    j.s = s;
3440    stbi__setup_jpeg(&j);
3441    return load_jpeg_image(&j, x,y,comp,req_comp);
3442 }
3443 
stbi__jpeg_test(stbi__context * s)3444 static int stbi__jpeg_test(stbi__context *s)
3445 {
3446    int r;
3447    stbi__jpeg j;
3448    j.s = s;
3449    stbi__setup_jpeg(&j);
3450    r = stbi__decode_jpeg_header(&j, STBI__SCAN_type);
3451    stbi__rewind(s);
3452    return r;
3453 }
3454 
stbi__jpeg_info_raw(stbi__jpeg * j,int * x,int * y,int * comp)3455 static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
3456 {
3457    if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
3458       stbi__rewind( j->s );
3459       return 0;
3460    }
3461    if (x) *x = j->s->img_x;
3462    if (y) *y = j->s->img_y;
3463    if (comp) *comp = j->s->img_n;
3464    return 1;
3465 }
3466 
stbi__jpeg_info(stbi__context * s,int * x,int * y,int * comp)3467 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
3468 {
3469    stbi__jpeg j;
3470    j.s = s;
3471    return stbi__jpeg_info_raw(&j, x, y, comp);
3472 }
3473 #endif
3474 
3475 // public domain zlib decode    v0.2  Sean Barrett 2006-11-18
3476 //    simple implementation
3477 //      - all input must be provided in an upfront buffer
3478 //      - all output is written to a single output buffer (can malloc/realloc)
3479 //    performance
3480 //      - fast huffman
3481 
3482 #ifndef STBI_NO_ZLIB
3483 
3484 // fast-way is faster to check than jpeg huffman, but slow way is slower
3485 #define STBI__ZFAST_BITS  9 // accelerate all cases in default tables
3486 #define STBI__ZFAST_MASK  ((1 << STBI__ZFAST_BITS) - 1)
3487 
3488 // zlib-style huffman encoding
3489 // (jpegs packs from left, zlib from right, so can't share code)
3490 typedef struct
3491 {
3492    stbi__uint16 fast[1 << STBI__ZFAST_BITS];
3493    stbi__uint16 firstcode[16];
3494    int maxcode[17];
3495    stbi__uint16 firstsymbol[16];
3496    stbi_uc  size[288];
3497    stbi__uint16 value[288];
3498 } stbi__zhuffman;
3499 
stbi__bitreverse16(int n)3500 stbi_inline static int stbi__bitreverse16(int n)
3501 {
3502   n = ((n & 0xAAAA) >>  1) | ((n & 0x5555) << 1);
3503   n = ((n & 0xCCCC) >>  2) | ((n & 0x3333) << 2);
3504   n = ((n & 0xF0F0) >>  4) | ((n & 0x0F0F) << 4);
3505   n = ((n & 0xFF00) >>  8) | ((n & 0x00FF) << 8);
3506   return n;
3507 }
3508 
stbi__bit_reverse(int v,int bits)3509 stbi_inline static int stbi__bit_reverse(int v, int bits)
3510 {
3511    STBI_ASSERT(bits <= 16);
3512    // to bit reverse n bits, reverse 16 and shift
3513    // e.g. 11 bits, bit reverse and shift away 5
3514    return stbi__bitreverse16(v) >> (16-bits);
3515 }
3516 
stbi__zbuild_huffman(stbi__zhuffman * z,stbi_uc * sizelist,int num)3517 static int stbi__zbuild_huffman(stbi__zhuffman *z, stbi_uc *sizelist, int num)
3518 {
3519    int i,k=0;
3520    int code, next_code[16], sizes[17];
3521 
3522    // DEFLATE spec for generating codes
3523    memset(sizes, 0, sizeof(sizes));
3524    memset(z->fast, 0, sizeof(z->fast));
3525    for (i=0; i < num; ++i)
3526       ++sizes[sizelist[i]];
3527    sizes[0] = 0;
3528    for (i=1; i < 16; ++i)
3529       if (sizes[i] > (1 << i))
3530          return stbi__err("bad sizes", "Corrupt PNG");
3531    code = 0;
3532    for (i=1; i < 16; ++i) {
3533       next_code[i] = code;
3534       z->firstcode[i] = (stbi__uint16) code;
3535       z->firstsymbol[i] = (stbi__uint16) k;
3536       code = (code + sizes[i]);
3537       if (sizes[i])
3538          if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt PNG");
3539       z->maxcode[i] = code << (16-i); // preshift for inner loop
3540       code <<= 1;
3541       k += sizes[i];
3542    }
3543    z->maxcode[16] = 0x10000; // sentinel
3544    for (i=0; i < num; ++i) {
3545       int s = sizelist[i];
3546       if (s) {
3547          int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
3548          stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
3549          z->size [c] = (stbi_uc     ) s;
3550          z->value[c] = (stbi__uint16) i;
3551          if (s <= STBI__ZFAST_BITS) {
3552             int j = stbi__bit_reverse(next_code[s],s);
3553             while (j < (1 << STBI__ZFAST_BITS)) {
3554                z->fast[j] = fastv;
3555                j += (1 << s);
3556             }
3557          }
3558          ++next_code[s];
3559       }
3560    }
3561    return 1;
3562 }
3563 
3564 // zlib-from-memory implementation for PNG reading
3565 //    because PNG allows splitting the zlib stream arbitrarily,
3566 //    and it's annoying structurally to have PNG call ZLIB call PNG,
3567 //    we require PNG read all the IDATs and combine them into a single
3568 //    memory buffer
3569 
3570 typedef struct
3571 {
3572    stbi_uc *zbuffer, *zbuffer_end;
3573    int num_bits;
3574    stbi__uint32 code_buffer;
3575 
3576    char *zout;
3577    char *zout_start;
3578    char *zout_end;
3579    int   z_expandable;
3580 
3581    stbi__zhuffman z_length, z_distance;
3582 } stbi__zbuf;
3583 
stbi__zget8(stbi__zbuf * z)3584 stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
3585 {
3586    if (z->zbuffer >= z->zbuffer_end) return 0;
3587    return *z->zbuffer++;
3588 }
3589 
stbi__fill_bits(stbi__zbuf * z)3590 static void stbi__fill_bits(stbi__zbuf *z)
3591 {
3592    do {
3593       STBI_ASSERT(z->code_buffer < (1U << z->num_bits));
3594       z->code_buffer |= (unsigned int) stbi__zget8(z) << z->num_bits;
3595       z->num_bits += 8;
3596    } while (z->num_bits <= 24);
3597 }
3598 
stbi__zreceive(stbi__zbuf * z,int n)3599 stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
3600 {
3601    unsigned int k;
3602    if (z->num_bits < n) stbi__fill_bits(z);
3603    k = z->code_buffer & ((1 << n) - 1);
3604    z->code_buffer >>= n;
3605    z->num_bits -= n;
3606    return k;
3607 }
3608 
stbi__zhuffman_decode_slowpath(stbi__zbuf * a,stbi__zhuffman * z)3609 static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
3610 {
3611    int b,s,k;
3612    // not resolved by fast table, so compute it the slow way
3613    // use jpeg approach, which requires MSbits at top
3614    k = stbi__bit_reverse(a->code_buffer, 16);
3615    for (s=STBI__ZFAST_BITS+1; ; ++s)
3616       if (k < z->maxcode[s])
3617          break;
3618    if (s == 16) return -1; // invalid code!
3619    // code size is s, so:
3620    b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
3621    STBI_ASSERT(z->size[b] == s);
3622    a->code_buffer >>= s;
3623    a->num_bits -= s;
3624    return z->value[b];
3625 }
3626 
stbi__zhuffman_decode(stbi__zbuf * a,stbi__zhuffman * z)3627 stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
3628 {
3629    int b,s;
3630    if (a->num_bits < 16) stbi__fill_bits(a);
3631    b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
3632    if (b) {
3633       s = b >> 9;
3634       a->code_buffer >>= s;
3635       a->num_bits -= s;
3636       return b & 511;
3637    }
3638    return stbi__zhuffman_decode_slowpath(a, z);
3639 }
3640 
stbi__zexpand(stbi__zbuf * z,char * zout,int n)3641 static int stbi__zexpand(stbi__zbuf *z, char *zout, int n)  // need to make room for n bytes
3642 {
3643    char *q;
3644    int cur, limit;
3645    z->zout = zout;
3646    if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
3647    cur   = (int) (z->zout     - z->zout_start);
3648    limit = (int) (z->zout_end - z->zout_start);
3649    while (cur + n > limit)
3650       limit *= 2;
3651    q = (char *) STBI_REALLOC(z->zout_start, limit);
3652    if (q == NULL) return stbi__err("outofmem", "Out of memory");
3653    z->zout_start = q;
3654    z->zout       = q + cur;
3655    z->zout_end   = q + limit;
3656    return 1;
3657 }
3658 
3659 static int stbi__zlength_base[31] = {
3660    3,4,5,6,7,8,9,10,11,13,
3661    15,17,19,23,27,31,35,43,51,59,
3662    67,83,99,115,131,163,195,227,258,0,0 };
3663 
3664 static int stbi__zlength_extra[31]=
3665 { 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
3666 
3667 static int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
3668 257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
3669 
3670 static int stbi__zdist_extra[32] =
3671 { 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
3672 
stbi__parse_huffman_block(stbi__zbuf * a)3673 static int stbi__parse_huffman_block(stbi__zbuf *a)
3674 {
3675    char *zout = a->zout;
3676    for(;;) {
3677       int z = stbi__zhuffman_decode(a, &a->z_length);
3678       if (z < 256) {
3679          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
3680          if (zout >= a->zout_end) {
3681             if (!stbi__zexpand(a, zout, 1)) return 0;
3682             zout = a->zout;
3683          }
3684          *zout++ = (char) z;
3685       } else {
3686          stbi_uc *p;
3687          int len,dist;
3688          if (z == 256) {
3689             a->zout = zout;
3690             return 1;
3691          }
3692          z -= 257;
3693          len = stbi__zlength_base[z];
3694          if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
3695          z = stbi__zhuffman_decode(a, &a->z_distance);
3696          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG");
3697          dist = stbi__zdist_base[z];
3698          if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
3699          if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
3700          if (zout + len > a->zout_end) {
3701             if (!stbi__zexpand(a, zout, len)) return 0;
3702             zout = a->zout;
3703          }
3704          p = (stbi_uc *) (zout - dist);
3705          if (dist == 1) { // run of one byte; common in images.
3706             stbi_uc v = *p;
3707             if (len) { do *zout++ = v; while (--len); }
3708          } else {
3709             if (len) { do *zout++ = *p++; while (--len); }
3710          }
3711       }
3712    }
3713 }
3714 
stbi__compute_huffman_codes(stbi__zbuf * a)3715 static int stbi__compute_huffman_codes(stbi__zbuf *a)
3716 {
3717    static stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
3718    stbi__zhuffman z_codelength;
3719    stbi_uc lencodes[286+32+137];//padding for maximum single op
3720    stbi_uc codelength_sizes[19];
3721    int i,n;
3722 
3723    int hlit  = stbi__zreceive(a,5) + 257;
3724    int hdist = stbi__zreceive(a,5) + 1;
3725    int hclen = stbi__zreceive(a,4) + 4;
3726 
3727    memset(codelength_sizes, 0, sizeof(codelength_sizes));
3728    for (i=0; i < hclen; ++i) {
3729       int s = stbi__zreceive(a,3);
3730       codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
3731    }
3732    if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
3733 
3734    n = 0;
3735    while (n < hlit + hdist) {
3736       int c = stbi__zhuffman_decode(a, &z_codelength);
3737       if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG");
3738       if (c < 16)
3739          lencodes[n++] = (stbi_uc) c;
3740       else if (c == 16) {
3741          c = stbi__zreceive(a,2)+3;
3742          memset(lencodes+n, lencodes[n-1], c);
3743          n += c;
3744       } else if (c == 17) {
3745          c = stbi__zreceive(a,3)+3;
3746          memset(lencodes+n, 0, c);
3747          n += c;
3748       } else {
3749          STBI_ASSERT(c == 18);
3750          c = stbi__zreceive(a,7)+11;
3751          memset(lencodes+n, 0, c);
3752          n += c;
3753       }
3754    }
3755    if (n != hlit+hdist) return stbi__err("bad codelengths","Corrupt PNG");
3756    if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
3757    if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
3758    return 1;
3759 }
3760 
stbi__parse_uncomperssed_block(stbi__zbuf * a)3761 static int stbi__parse_uncomperssed_block(stbi__zbuf *a)
3762 {
3763    stbi_uc header[4];
3764    int len,nlen,k;
3765    if (a->num_bits & 7)
3766       stbi__zreceive(a, a->num_bits & 7); // discard
3767    // drain the bit-packed data into header
3768    k = 0;
3769    while (a->num_bits > 0) {
3770       header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
3771       a->code_buffer >>= 8;
3772       a->num_bits -= 8;
3773    }
3774    STBI_ASSERT(a->num_bits == 0);
3775    // now fill header the normal way
3776    while (k < 4)
3777       header[k++] = stbi__zget8(a);
3778    len  = header[1] * 256 + header[0];
3779    nlen = header[3] * 256 + header[2];
3780    if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
3781    if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
3782    if (a->zout + len > a->zout_end)
3783       if (!stbi__zexpand(a, a->zout, len)) return 0;
3784    memcpy(a->zout, a->zbuffer, len);
3785    a->zbuffer += len;
3786    a->zout += len;
3787    return 1;
3788 }
3789 
stbi__parse_zlib_header(stbi__zbuf * a)3790 static int stbi__parse_zlib_header(stbi__zbuf *a)
3791 {
3792    int cmf   = stbi__zget8(a);
3793    int cm    = cmf & 15;
3794    /* int cinfo = cmf >> 4; */
3795    int flg   = stbi__zget8(a);
3796    if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
3797    if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
3798    if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
3799    // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
3800    return 1;
3801 }
3802 
3803 // @TODO: should statically initialize these for optimal thread safety
3804 static stbi_uc stbi__zdefault_length[288], stbi__zdefault_distance[32];
stbi__init_zdefaults(void)3805 static void stbi__init_zdefaults(void)
3806 {
3807    int i;   // use <= to match clearly with spec
3808    for (i=0; i <= 143; ++i)     stbi__zdefault_length[i]   = 8;
3809    for (   ; i <= 255; ++i)     stbi__zdefault_length[i]   = 9;
3810    for (   ; i <= 279; ++i)     stbi__zdefault_length[i]   = 7;
3811    for (   ; i <= 287; ++i)     stbi__zdefault_length[i]   = 8;
3812 
3813    for (i=0; i <=  31; ++i)     stbi__zdefault_distance[i] = 5;
3814 }
3815 
stbi__parse_zlib(stbi__zbuf * a,int parse_header)3816 static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
3817 {
3818    int final, type;
3819    if (parse_header)
3820       if (!stbi__parse_zlib_header(a)) return 0;
3821    a->num_bits = 0;
3822    a->code_buffer = 0;
3823    do {
3824       final = stbi__zreceive(a,1);
3825       type = stbi__zreceive(a,2);
3826       if (type == 0) {
3827          if (!stbi__parse_uncomperssed_block(a)) return 0;
3828       } else if (type == 3) {
3829          return 0;
3830       } else {
3831          if (type == 1) {
3832             // use fixed code lengths
3833             if (!stbi__zdefault_distance[31]) stbi__init_zdefaults();
3834             if (!stbi__zbuild_huffman(&a->z_length  , stbi__zdefault_length  , 288)) return 0;
3835             if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance,  32)) return 0;
3836          } else {
3837             if (!stbi__compute_huffman_codes(a)) return 0;
3838          }
3839          if (!stbi__parse_huffman_block(a)) return 0;
3840       }
3841    } while (!final);
3842    return 1;
3843 }
3844 
stbi__do_zlib(stbi__zbuf * a,char * obuf,int olen,int exp,int parse_header)3845 static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
3846 {
3847    a->zout_start = obuf;
3848    a->zout       = obuf;
3849    a->zout_end   = obuf + olen;
3850    a->z_expandable = exp;
3851 
3852    return stbi__parse_zlib(a, parse_header);
3853 }
3854 
stbi_zlib_decode_malloc_guesssize(const char * buffer,int len,int initial_size,int * outlen)3855 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
3856 {
3857    stbi__zbuf a;
3858    char *p = (char *) stbi__malloc(initial_size);
3859    if (p == NULL) return NULL;
3860    a.zbuffer = (stbi_uc *) buffer;
3861    a.zbuffer_end = (stbi_uc *) buffer + len;
3862    if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
3863       if (outlen) *outlen = (int) (a.zout - a.zout_start);
3864       return a.zout_start;
3865    } else {
3866       STBI_FREE(a.zout_start);
3867       return NULL;
3868    }
3869 }
3870 
stbi_zlib_decode_malloc(char const * buffer,int len,int * outlen)3871 STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
3872 {
3873    return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
3874 }
3875 
stbi_zlib_decode_malloc_guesssize_headerflag(const char * buffer,int len,int initial_size,int * outlen,int parse_header)3876 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
3877 {
3878    stbi__zbuf a;
3879    char *p = (char *) stbi__malloc(initial_size);
3880    if (p == NULL) return NULL;
3881    a.zbuffer = (stbi_uc *) buffer;
3882    a.zbuffer_end = (stbi_uc *) buffer + len;
3883    if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
3884       if (outlen) *outlen = (int) (a.zout - a.zout_start);
3885       return a.zout_start;
3886    } else {
3887       STBI_FREE(a.zout_start);
3888       return NULL;
3889    }
3890 }
3891 
stbi_zlib_decode_buffer(char * obuffer,int olen,char const * ibuffer,int ilen)3892 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
3893 {
3894    stbi__zbuf a;
3895    a.zbuffer = (stbi_uc *) ibuffer;
3896    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
3897    if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
3898       return (int) (a.zout - a.zout_start);
3899    else
3900       return -1;
3901 }
3902 
stbi_zlib_decode_noheader_malloc(char const * buffer,int len,int * outlen)3903 STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
3904 {
3905    stbi__zbuf a;
3906    char *p = (char *) stbi__malloc(16384);
3907    if (p == NULL) return NULL;
3908    a.zbuffer = (stbi_uc *) buffer;
3909    a.zbuffer_end = (stbi_uc *) buffer+len;
3910    if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
3911       if (outlen) *outlen = (int) (a.zout - a.zout_start);
3912       return a.zout_start;
3913    } else {
3914       STBI_FREE(a.zout_start);
3915       return NULL;
3916    }
3917 }
3918 
stbi_zlib_decode_noheader_buffer(char * obuffer,int olen,const char * ibuffer,int ilen)3919 STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
3920 {
3921    stbi__zbuf a;
3922    a.zbuffer = (stbi_uc *) ibuffer;
3923    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
3924    if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
3925       return (int) (a.zout - a.zout_start);
3926    else
3927       return -1;
3928 }
3929 #endif
3930 
3931 // public domain "baseline" PNG decoder   v0.10  Sean Barrett 2006-11-18
3932 //    simple implementation
3933 //      - only 8-bit samples
3934 //      - no CRC checking
3935 //      - allocates lots of intermediate memory
3936 //        - avoids problem of streaming data between subsystems
3937 //        - avoids explicit window management
3938 //    performance
3939 //      - uses stb_zlib, a PD zlib implementation with fast huffman decoding
3940 
3941 #ifndef STBI_NO_PNG
3942 typedef struct
3943 {
3944    stbi__uint32 length;
3945    stbi__uint32 type;
3946 } stbi__pngchunk;
3947 
stbi__get_chunk_header(stbi__context * s)3948 static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
3949 {
3950    stbi__pngchunk c;
3951    c.length = stbi__get32be(s);
3952    c.type   = stbi__get32be(s);
3953    return c;
3954 }
3955 
stbi__check_png_header(stbi__context * s)3956 static int stbi__check_png_header(stbi__context *s)
3957 {
3958    static stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
3959    int i;
3960    for (i=0; i < 8; ++i)
3961       if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
3962    return 1;
3963 }
3964 
3965 typedef struct
3966 {
3967    stbi__context *s;
3968    stbi_uc *idata, *expanded, *out;
3969 } stbi__png;
3970 
3971 
3972 enum {
3973    STBI__F_none=0,
3974    STBI__F_sub=1,
3975    STBI__F_up=2,
3976    STBI__F_avg=3,
3977    STBI__F_paeth=4,
3978    // synthetic filters used for first scanline to avoid needing a dummy row of 0s
3979    STBI__F_avg_first,
3980    STBI__F_paeth_first
3981 };
3982 
3983 static stbi_uc first_row_filter[5] =
3984 {
3985    STBI__F_none,
3986    STBI__F_sub,
3987    STBI__F_none,
3988    STBI__F_avg_first,
3989    STBI__F_paeth_first
3990 };
3991 
stbi__paeth(int a,int b,int c)3992 static int stbi__paeth(int a, int b, int c)
3993 {
3994    int p = a + b - c;
3995    int pa = abs(p-a);
3996    int pb = abs(p-b);
3997    int pc = abs(p-c);
3998    if (pa <= pb && pa <= pc) return a;
3999    if (pb <= pc) return b;
4000    return c;
4001 }
4002 
4003 static stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
4004 
4005 // create the png data from post-deflated data
stbi__create_png_image_raw(stbi__png * a,stbi_uc * raw,stbi__uint32 raw_len,int out_n,stbi__uint32 x,stbi__uint32 y,int depth,int color)4006 static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
4007 {
4008    stbi__context *s = a->s;
4009    stbi__uint32 i,j,stride = x*out_n;
4010    stbi__uint32 img_len, img_width_bytes;
4011    int k;
4012    int img_n = s->img_n; // copy it into a local for later
4013 
4014    STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
4015    if (x == 0 || y == 0 || out_n <= 0 || (out_n > (INT_MAX / x / y)))
4016        return stbi__err("Integer Overflow", "x or y incorrect");
4017    a->out = (stbi_uc *) stbi__malloc(x * y * out_n); // extra bytes to write off the end into
4018    if (!a->out) return stbi__err("outofmem", "Out of memory");
4019 
4020    img_width_bytes = (((img_n * x * depth) + 7) >> 3);
4021    img_len = (img_width_bytes + 1) * y;
4022    if (s->img_x == x && s->img_y == y) {
4023       if (raw_len != img_len) return stbi__err("not enough pixels","Corrupt PNG");
4024    } else { // interlaced:
4025       if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
4026    }
4027 
4028    for (j=0; j < y; ++j) {
4029       stbi_uc *cur = a->out + stride*j;
4030       stbi_uc *prior = cur - stride;
4031       int filter = *raw++;
4032       int filter_bytes = img_n;
4033       int width = x;
4034       if (filter > 4)
4035          return stbi__err("invalid filter","Corrupt PNG");
4036 
4037       if (depth < 8) {
4038          STBI_ASSERT(img_width_bytes <= x);
4039          cur += x*out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we can decode in place
4040          filter_bytes = 1;
4041          width = img_width_bytes;
4042       }
4043 
4044       // if first row, use special filter that doesn't sample previous row
4045       if (j == 0) filter = first_row_filter[filter];
4046 
4047       // handle first byte explicitly
4048       for (k=0; k < filter_bytes; ++k) {
4049          switch (filter) {
4050             case STBI__F_none       : cur[k] = raw[k]; break;
4051             case STBI__F_sub        : cur[k] = raw[k]; break;
4052             case STBI__F_up         : cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
4053             case STBI__F_avg        : cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1)); break;
4054             case STBI__F_paeth      : cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(0,prior[k],0)); break;
4055             case STBI__F_avg_first  : cur[k] = raw[k]; break;
4056             case STBI__F_paeth_first: cur[k] = raw[k]; break;
4057          }
4058       }
4059 
4060       if (depth == 8) {
4061          if (img_n != out_n)
4062             cur[img_n] = 255; // first pixel
4063          raw += img_n;
4064          cur += out_n;
4065          prior += out_n;
4066       } else {
4067          raw += 1;
4068          cur += 1;
4069          prior += 1;
4070       }
4071 
4072       // this is a little gross, so that we don't switch per-pixel or per-component
4073       if (depth < 8 || img_n == out_n) {
4074          int nk = (width - 1)*img_n;
4075          #define CASE(f) \
4076              case f:     \
4077                 for (k=0; k < nk; ++k)
4078          switch (filter) {
4079             // "none" filter turns into a memcpy here; make that explicit.
4080             case STBI__F_none:         memcpy(cur, raw, nk); break;
4081             CASE(STBI__F_sub)          cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]); break;
4082             CASE(STBI__F_up)           cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
4083             CASE(STBI__F_avg)          cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1)); break;
4084             CASE(STBI__F_paeth)        cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],prior[k],prior[k-filter_bytes])); break;
4085             CASE(STBI__F_avg_first)    cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1)); break;
4086             CASE(STBI__F_paeth_first)  cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],0,0)); break;
4087          }
4088          #undef CASE
4089          raw += nk;
4090       } else {
4091          STBI_ASSERT(img_n+1 == out_n);
4092          #define CASE(f) \
4093              case f:     \
4094                 for (i=x-1; i >= 1; --i, cur[img_n]=255,raw+=img_n,cur+=out_n,prior+=out_n) \
4095                    for (k=0; k < img_n; ++k)
4096          switch (filter) {
4097             CASE(STBI__F_none)         cur[k] = raw[k]; break;
4098             CASE(STBI__F_sub)          cur[k] = STBI__BYTECAST(raw[k] + cur[k-out_n]); break;
4099             CASE(STBI__F_up)           cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
4100             CASE(STBI__F_avg)          cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-out_n])>>1)); break;
4101             CASE(STBI__F_paeth)        cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-out_n],prior[k],prior[k-out_n])); break;
4102             CASE(STBI__F_avg_first)    cur[k] = STBI__BYTECAST(raw[k] + (cur[k-out_n] >> 1)); break;
4103             CASE(STBI__F_paeth_first)  cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-out_n],0,0)); break;
4104          }
4105          #undef CASE
4106       }
4107    }
4108 
4109    // we make a separate pass to expand bits to pixels; for performance,
4110    // this could run two scanlines behind the above code, so it won't
4111    // intefere with filtering but will still be in the cache.
4112    if (depth < 8) {
4113       for (j=0; j < y; ++j) {
4114          stbi_uc *cur = a->out + stride*j;
4115          stbi_uc *in  = a->out + stride*j + x*out_n - img_width_bytes;
4116          // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at minimal cost for 1/2/4-bit
4117          // png guarante byte alignment, if width is not multiple of 8/4/2 we'll decode dummy trailing data that will be skipped in the later loop
4118          stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
4119 
4120          // note that the final byte might overshoot and write more data than desired.
4121          // we can allocate enough data that this never writes out of memory, but it
4122          // could also overwrite the next scanline. can it overwrite non-empty data
4123          // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel.
4124          // so we need to explicitly clamp the final ones
4125 
4126          if (depth == 4) {
4127             for (k=x*img_n; k >= 2; k-=2, ++in) {
4128                *cur++ = scale * ((*in >> 4)       );
4129                *cur++ = scale * ((*in     ) & 0x0f);
4130             }
4131             if (k > 0) *cur++ = scale * ((*in >> 4)       );
4132          } else if (depth == 2) {
4133             for (k=x*img_n; k >= 4; k-=4, ++in) {
4134                *cur++ = scale * ((*in >> 6)       );
4135                *cur++ = scale * ((*in >> 4) & 0x03);
4136                *cur++ = scale * ((*in >> 2) & 0x03);
4137                *cur++ = scale * ((*in     ) & 0x03);
4138             }
4139             if (k > 0) *cur++ = scale * ((*in >> 6)       );
4140             if (k > 1) *cur++ = scale * ((*in >> 4) & 0x03);
4141             if (k > 2) *cur++ = scale * ((*in >> 2) & 0x03);
4142          } else if (depth == 1) {
4143             for (k=x*img_n; k >= 8; k-=8, ++in) {
4144                *cur++ = scale * ((*in >> 7)       );
4145                *cur++ = scale * ((*in >> 6) & 0x01);
4146                *cur++ = scale * ((*in >> 5) & 0x01);
4147                *cur++ = scale * ((*in >> 4) & 0x01);
4148                *cur++ = scale * ((*in >> 3) & 0x01);
4149                *cur++ = scale * ((*in >> 2) & 0x01);
4150                *cur++ = scale * ((*in >> 1) & 0x01);
4151                *cur++ = scale * ((*in     ) & 0x01);
4152             }
4153             if (k > 0) *cur++ = scale * ((*in >> 7)       );
4154             if (k > 1) *cur++ = scale * ((*in >> 6) & 0x01);
4155             if (k > 2) *cur++ = scale * ((*in >> 5) & 0x01);
4156             if (k > 3) *cur++ = scale * ((*in >> 4) & 0x01);
4157             if (k > 4) *cur++ = scale * ((*in >> 3) & 0x01);
4158             if (k > 5) *cur++ = scale * ((*in >> 2) & 0x01);
4159             if (k > 6) *cur++ = scale * ((*in >> 1) & 0x01);
4160          }
4161          if (img_n != out_n) {
4162             int q;
4163             // insert alpha = 255
4164             cur = a->out + stride*j;
4165             if (img_n == 1) {
4166                for (q=x-1; q >= 0; --q) {
4167                   cur[q*2+1] = 255;
4168                   cur[q*2+0] = cur[q];
4169                }
4170             } else {
4171                STBI_ASSERT(img_n == 3);
4172                for (q=x-1; q >= 0; --q) {
4173                   cur[q*4+3] = 255;
4174                   cur[q*4+2] = cur[q*3+2];
4175                   cur[q*4+1] = cur[q*3+1];
4176                   cur[q*4+0] = cur[q*3+0];
4177                }
4178             }
4179          }
4180       }
4181    }
4182 
4183    return 1;
4184 }
4185 
stbi__create_png_image(stbi__png * a,stbi_uc * image_data,stbi__uint32 image_data_len,int out_n,int depth,int color,int interlaced)4186 static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
4187 {
4188    stbi_uc *final;
4189    int p;
4190    if (!interlaced)
4191       return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
4192 
4193    // de-interlacing
4194    if (a->s->img_x == 0 || a->s->img_y == 0 || out_n <= 0
4195          || (out_n > (INT_MAX / a->s->img_x / a->s->img_y)))
4196       return stbi__err("Integer Overflow", "x or y incorrect");
4197 
4198    final = (stbi_uc *) stbi__malloc(a->s->img_x * a->s->img_y * out_n);
4199    if (final == NULL) return stbi__err("outofmem", "Out of memory");
4200    for (p=0; p < 7; ++p) {
4201       int xorig[] = { 0,4,0,2,0,1,0 };
4202       int yorig[] = { 0,0,4,0,2,0,1 };
4203       int xspc[]  = { 8,8,4,4,2,2,1 };
4204       int yspc[]  = { 8,8,8,4,4,2,2 };
4205       int i,j,x,y;
4206       // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
4207       x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
4208       y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
4209       if (x && y) {
4210          stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
4211          if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
4212             STBI_FREE(final);
4213             return 0;
4214          }
4215          for (j=0; j < y; ++j) {
4216             for (i=0; i < x; ++i) {
4217                int out_y = j*yspc[p]+yorig[p];
4218                int out_x = i*xspc[p]+xorig[p];
4219                memcpy(final + out_y*a->s->img_x*out_n + out_x*out_n,
4220                       a->out + (j*x+i)*out_n, out_n);
4221             }
4222          }
4223          STBI_FREE(a->out);
4224          image_data += img_len;
4225          image_data_len -= img_len;
4226       }
4227    }
4228    a->out = final;
4229 
4230    return 1;
4231 }
4232 
stbi__compute_transparency(stbi__png * z,stbi_uc tc[3],int out_n)4233 static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
4234 {
4235    stbi__context *s = z->s;
4236    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4237    stbi_uc *p = z->out;
4238 
4239    // compute color-based transparency, assuming we've
4240    // already got 255 as the alpha value in the output
4241    STBI_ASSERT(out_n == 2 || out_n == 4);
4242 
4243    if (out_n == 2) {
4244       for (i=0; i < pixel_count; ++i) {
4245          p[1] = (p[0] == tc[0] ? 0 : 255);
4246          p += 2;
4247       }
4248    } else {
4249       for (i=0; i < pixel_count; ++i) {
4250          if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4251             p[3] = 0;
4252          p += 4;
4253       }
4254    }
4255    return 1;
4256 }
4257 
stbi__expand_png_palette(stbi__png * a,stbi_uc * palette,int len,int pal_img_n)4258 static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
4259 {
4260    stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
4261    stbi_uc *p, *temp_out, *orig = a->out;
4262 
4263    if(a->s->img_x == 0 || a->s->img_y == 0 || pal_img_n > (INT_MAX / a->s->img_x / a->s->img_y))
4264        return stbi__err("Integer Overflow", "x or y incorrect");
4265    p = (stbi_uc *) stbi__malloc(pixel_count * pal_img_n);
4266    if (p == NULL) return stbi__err("outofmem", "Out of memory");
4267 
4268    // between here and free(out) below, exitting would leak
4269    temp_out = p;
4270 
4271    if (pal_img_n == 3) {
4272       for (i=0; i < pixel_count; ++i) {
4273          int n = orig[i]*4;
4274          p[0] = palette[n  ];
4275          p[1] = palette[n+1];
4276          p[2] = palette[n+2];
4277          p += 3;
4278       }
4279    } else {
4280       for (i=0; i < pixel_count; ++i) {
4281          int n = orig[i]*4;
4282          p[0] = palette[n  ];
4283          p[1] = palette[n+1];
4284          p[2] = palette[n+2];
4285          p[3] = palette[n+3];
4286          p += 4;
4287       }
4288    }
4289    STBI_FREE(a->out);
4290    a->out = temp_out;
4291 
4292    STBI_NOTUSED(len);
4293 
4294    return 1;
4295 }
4296 
4297 static int stbi__unpremultiply_on_load = 0;
4298 static int stbi__de_iphone_flag = 0;
4299 
stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)4300 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
4301 {
4302    stbi__unpremultiply_on_load = flag_true_if_should_unpremultiply;
4303 }
4304 
stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)4305 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
4306 {
4307    stbi__de_iphone_flag = flag_true_if_should_convert;
4308 }
4309 
stbi__de_iphone(stbi__png * z)4310 static void stbi__de_iphone(stbi__png *z)
4311 {
4312    stbi__context *s = z->s;
4313    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4314    stbi_uc *p = z->out;
4315 
4316    if (s->img_out_n == 3) {  // convert bgr to rgb
4317       for (i=0; i < pixel_count; ++i) {
4318          stbi_uc t = p[0];
4319          p[0] = p[2];
4320          p[2] = t;
4321          p += 3;
4322       }
4323    } else {
4324       STBI_ASSERT(s->img_out_n == 4);
4325       if (stbi__unpremultiply_on_load) {
4326          // convert bgr to rgb and unpremultiply
4327          for (i=0; i < pixel_count; ++i) {
4328             stbi_uc a = p[3];
4329             stbi_uc t = p[0];
4330             if (a) {
4331                p[0] = p[2] * 255 / a;
4332                p[1] = p[1] * 255 / a;
4333                p[2] =  t   * 255 / a;
4334             } else {
4335                p[0] = p[2];
4336                p[2] = t;
4337             }
4338             p += 4;
4339          }
4340       } else {
4341          // convert bgr to rgb
4342          for (i=0; i < pixel_count; ++i) {
4343             stbi_uc t = p[0];
4344             p[0] = p[2];
4345             p[2] = t;
4346             p += 4;
4347          }
4348       }
4349    }
4350 }
4351 
4352 #define STBI__PNG_TYPE(a,b,c,d)  (((a) << 24) + ((b) << 16) + ((c) << 8) + (d))
4353 
stbi__parse_png_file(stbi__png * z,int scan,int req_comp)4354 static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
4355 {
4356    stbi_uc palette[1024], pal_img_n=0;
4357    stbi_uc has_trans=0, tc[3];
4358    stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
4359    int first=1,k,interlace=0, color=0, depth=0, is_iphone=0;
4360    stbi__context *s = z->s;
4361 
4362    z->expanded = NULL;
4363    z->idata = NULL;
4364    z->out = NULL;
4365 
4366    if (!stbi__check_png_header(s)) return 0;
4367 
4368    if (scan == STBI__SCAN_type) return 1;
4369 
4370    for (;;) {
4371       stbi__pngchunk c = stbi__get_chunk_header(s);
4372       switch (c.type) {
4373          case STBI__PNG_TYPE('C','g','B','I'):
4374             is_iphone = 1;
4375             stbi__skip(s, c.length);
4376             break;
4377          case STBI__PNG_TYPE('I','H','D','R'): {
4378             int comp,filter;
4379             if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
4380             first = 0;
4381             if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
4382             s->img_x = stbi__get32be(s); if (s->img_x > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
4383             s->img_y = stbi__get32be(s); if (s->img_y > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
4384             depth = stbi__get8(s);  if (depth != 1 && depth != 2 && depth != 4 && depth != 8)  return stbi__err("1/2/4/8-bit only","PNG not supported: 1/2/4/8-bit only");
4385             color = stbi__get8(s);  if (color > 6)         return stbi__err("bad ctype","Corrupt PNG");
4386             if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
4387             comp  = stbi__get8(s);  if (comp) return stbi__err("bad comp method","Corrupt PNG");
4388             filter= stbi__get8(s);  if (filter) return stbi__err("bad filter method","Corrupt PNG");
4389             interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
4390             if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
4391             if (!pal_img_n) {
4392                s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
4393                if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
4394                if (scan == STBI__SCAN_header) return 1;
4395             } else {
4396                // if paletted, then pal_n is our final components, and
4397                // img_n is # components to decompress/filter.
4398                s->img_n = 1;
4399                if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
4400                // if SCAN_header, have to scan to see if we have a tRNS
4401             }
4402             break;
4403          }
4404 
4405          case STBI__PNG_TYPE('P','L','T','E'):  {
4406             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4407             if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
4408             pal_len = c.length / 3;
4409             if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
4410             for (i=0; i < pal_len; ++i) {
4411                palette[i*4+0] = stbi__get8(s);
4412                palette[i*4+1] = stbi__get8(s);
4413                palette[i*4+2] = stbi__get8(s);
4414                palette[i*4+3] = 255;
4415             }
4416             break;
4417          }
4418 
4419          case STBI__PNG_TYPE('t','R','N','S'): {
4420             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4421             if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
4422             if (pal_img_n) {
4423                if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
4424                if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
4425                if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
4426                pal_img_n = 4;
4427                for (i=0; i < c.length; ++i)
4428                   palette[i*4+3] = stbi__get8(s);
4429             } else {
4430                if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
4431                if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
4432                has_trans = 1;
4433                for (k=0; k < s->img_n; ++k)
4434                   tc[k] = (stbi_uc) (stbi__get16be(s) & 255) * stbi__depth_scale_table[depth]; // non 8-bit images will be larger
4435             }
4436             break;
4437          }
4438 
4439          case STBI__PNG_TYPE('I','D','A','T'): {
4440             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4441             if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
4442             if (scan == STBI__SCAN_header) { s->img_n = pal_img_n; return 1; }
4443             if ((int)(ioff + c.length) < (int)ioff) return 0;
4444             if (ioff + c.length > idata_limit) {
4445                stbi_uc *p;
4446                if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
4447                while (ioff + c.length > idata_limit)
4448                   idata_limit *= 2;
4449                p = (stbi_uc *) STBI_REALLOC(z->idata, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
4450                z->idata = p;
4451             }
4452             if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
4453             ioff += c.length;
4454             break;
4455          }
4456 
4457          case STBI__PNG_TYPE('I','E','N','D'): {
4458             stbi__uint32 raw_len, bpl;
4459             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4460             if (scan != STBI__SCAN_load) return 1;
4461             if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
4462             if (depth > (INT_MAX - 7) / s->img_x)
4463                 return stbi__err("Bad x","Bad x");
4464             // initial guess for decoded data size to avoid unnecessary reallocs
4465             bpl = (s->img_x * depth + 7) / 8; // bytes per line, per component
4466             if (bpl > (INT_MAX - s->img_y) / s->img_n / s->img_y)
4467                 return stbi__err("Integer Overflow","y incorrect");
4468             raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
4469             z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
4470             if (z->expanded == NULL) return 0; // zlib should set error
4471             STBI_FREE(z->idata); z->idata = NULL;
4472             if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
4473                s->img_out_n = s->img_n+1;
4474             else
4475                s->img_out_n = s->img_n;
4476             if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, depth, color, interlace)) return 0;
4477             if (has_trans)
4478                if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
4479             if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
4480                stbi__de_iphone(z);
4481             if (pal_img_n) {
4482                // pal_img_n == 3 or 4
4483                s->img_n = pal_img_n; // record the actual colors we had
4484                s->img_out_n = pal_img_n;
4485                if (req_comp >= 3) s->img_out_n = req_comp;
4486                if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
4487                   return 0;
4488             }
4489             STBI_FREE(z->expanded); z->expanded = NULL;
4490             return 1;
4491          }
4492 
4493          default:
4494             // if critical, fail
4495             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4496             if ((c.type & (1 << 29)) == 0) {
4497                #ifndef STBI_NO_FAILURE_STRINGS
4498                // not threadsafe
4499                static char invalid_chunk[] = "XXXX PNG chunk not known";
4500                invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
4501                invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
4502                invalid_chunk[2] = STBI__BYTECAST(c.type >>  8);
4503                invalid_chunk[3] = STBI__BYTECAST(c.type >>  0);
4504                #endif
4505                return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
4506             }
4507             stbi__skip(s, c.length);
4508             break;
4509       }
4510       // end of PNG chunk, read and skip CRC
4511       stbi__get32be(s);
4512    }
4513 }
4514 
stbi__do_png(stbi__png * p,int * x,int * y,int * n,int req_comp)4515 static unsigned char *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp)
4516 {
4517    unsigned char *result=NULL;
4518    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
4519    if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
4520       result = p->out;
4521       p->out = NULL;
4522       if (req_comp && req_comp != p->s->img_out_n) {
4523          result = stbi__convert_format(result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
4524          p->s->img_out_n = req_comp;
4525          if (result == NULL) return result;
4526       }
4527       *x = p->s->img_x;
4528       *y = p->s->img_y;
4529       if (n) *n = p->s->img_out_n;
4530    }
4531    STBI_FREE(p->out);      p->out      = NULL;
4532    STBI_FREE(p->expanded); p->expanded = NULL;
4533    STBI_FREE(p->idata);    p->idata    = NULL;
4534 
4535    return result;
4536 }
4537 
stbi__png_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)4538 static unsigned char *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4539 {
4540    stbi__png p;
4541    p.s = s;
4542    return stbi__do_png(&p, x,y,comp,req_comp);
4543 }
4544 
stbi__png_test(stbi__context * s)4545 static int stbi__png_test(stbi__context *s)
4546 {
4547    int r;
4548    r = stbi__check_png_header(s);
4549    stbi__rewind(s);
4550    return r;
4551 }
4552 
stbi__png_info_raw(stbi__png * p,int * x,int * y,int * comp)4553 static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
4554 {
4555    if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
4556       stbi__rewind( p->s );
4557       return 0;
4558    }
4559    if (x) *x = p->s->img_x;
4560    if (y) *y = p->s->img_y;
4561    if (comp) *comp = p->s->img_n;
4562    return 1;
4563 }
4564 
stbi__png_info(stbi__context * s,int * x,int * y,int * comp)4565 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
4566 {
4567    stbi__png p;
4568    p.s = s;
4569    return stbi__png_info_raw(&p, x, y, comp);
4570 }
4571 #endif
4572 
4573 // Microsoft/Windows BMP image
4574 
4575 #ifndef STBI_NO_BMP
stbi__bmp_test_raw(stbi__context * s)4576 static int stbi__bmp_test_raw(stbi__context *s)
4577 {
4578    int r;
4579    int sz;
4580    if (stbi__get8(s) != 'B') return 0;
4581    if (stbi__get8(s) != 'M') return 0;
4582    stbi__get32le(s); // discard filesize
4583    stbi__get16le(s); // discard reserved
4584    stbi__get16le(s); // discard reserved
4585    stbi__get32le(s); // discard data offset
4586    sz = stbi__get32le(s);
4587    r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
4588    return r;
4589 }
4590 
stbi__bmp_test(stbi__context * s)4591 static int stbi__bmp_test(stbi__context *s)
4592 {
4593    int r = stbi__bmp_test_raw(s);
4594    stbi__rewind(s);
4595    return r;
4596 }
4597 
4598 
4599 // returns 0..31 for the highest set bit
stbi__high_bit(unsigned int z)4600 static int stbi__high_bit(unsigned int z)
4601 {
4602    int n=0;
4603    if (z == 0) return -1;
4604    if (z >= 0x10000) n += 16, z >>= 16;
4605    if (z >= 0x00100) n +=  8, z >>=  8;
4606    if (z >= 0x00010) n +=  4, z >>=  4;
4607    if (z >= 0x00004) n +=  2, z >>=  2;
4608    if (z >= 0x00002) n +=  1, z >>=  1;
4609    return n;
4610 }
4611 
stbi__bitcount(unsigned int a)4612 static int stbi__bitcount(unsigned int a)
4613 {
4614    a = (a & 0x55555555) + ((a >>  1) & 0x55555555); // max 2
4615    a = (a & 0x33333333) + ((a >>  2) & 0x33333333); // max 4
4616    a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
4617    a = (a + (a >> 8)); // max 16 per 8 bits
4618    a = (a + (a >> 16)); // max 32 per 8 bits
4619    return a & 0xff;
4620 }
4621 
stbi__shiftsigned(int v,int shift,int bits)4622 static int stbi__shiftsigned(int v, int shift, int bits)
4623 {
4624    int result;
4625    int z=0;
4626 
4627    if (shift < 0) v <<= -shift;
4628    else v >>= shift;
4629    result = v;
4630 
4631    z = bits;
4632    while (z < 8) {
4633       result += v >> z;
4634       z += bits;
4635    }
4636    return result;
4637 }
4638 
stbi__bmp_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)4639 static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4640 {
4641    stbi_uc *out;
4642    unsigned int mr=0,mg=0,mb=0,ma=0, all_a=255;
4643    stbi_uc pal[256][4];
4644    int psize=0,i,j,compress=0,width;
4645    int bpp, flip_vertically, pad, target, offset, hsz;
4646    if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
4647    stbi__get32le(s); // discard filesize
4648    stbi__get16le(s); // discard reserved
4649    stbi__get16le(s); // discard reserved
4650    offset = stbi__get32le(s);
4651    hsz = stbi__get32le(s);
4652    if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
4653    if (hsz == 12) {
4654       s->img_x = stbi__get16le(s);
4655       s->img_y = stbi__get16le(s);
4656    } else {
4657       s->img_x = stbi__get32le(s);
4658       s->img_y = stbi__get32le(s);
4659    }
4660    if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
4661    bpp = stbi__get16le(s);
4662    if (bpp == 1) return stbi__errpuc("monochrome", "BMP type not supported: 1-bit");
4663    flip_vertically = ((int) s->img_y) > 0;
4664    s->img_y = abs((int) s->img_y);
4665    if (hsz == 12) {
4666       if (bpp < 24)
4667          psize = (offset - 14 - 24) / 3;
4668    } else {
4669       compress = stbi__get32le(s);
4670       if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
4671       stbi__get32le(s); // discard sizeof
4672       stbi__get32le(s); // discard hres
4673       stbi__get32le(s); // discard vres
4674       stbi__get32le(s); // discard colorsused
4675       stbi__get32le(s); // discard max important
4676       if (hsz == 40 || hsz == 56) {
4677          if (hsz == 56) {
4678             stbi__get32le(s);
4679             stbi__get32le(s);
4680             stbi__get32le(s);
4681             stbi__get32le(s);
4682          }
4683          if (bpp == 16 || bpp == 32) {
4684             mr = mg = mb = 0;
4685             if (compress == 0) {
4686                if (bpp == 32) {
4687                   mr = 0xffu << 16;
4688                   mg = 0xffu <<  8;
4689                   mb = 0xffu <<  0;
4690                   ma = 0xffu << 24;
4691                   all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0
4692                } else {
4693                   mr = 31u << 10;
4694                   mg = 31u <<  5;
4695                   mb = 31u <<  0;
4696                }
4697             } else if (compress == 3) {
4698                mr = stbi__get32le(s);
4699                mg = stbi__get32le(s);
4700                mb = stbi__get32le(s);
4701                // not documented, but generated by photoshop and handled by mspaint
4702                if (mr == mg && mg == mb) {
4703                   // ?!?!?
4704                   return stbi__errpuc("bad BMP", "bad BMP");
4705                }
4706             } else
4707                return stbi__errpuc("bad BMP", "bad BMP");
4708          }
4709       } else {
4710          STBI_ASSERT(hsz == 108 || hsz == 124);
4711          mr = stbi__get32le(s);
4712          mg = stbi__get32le(s);
4713          mb = stbi__get32le(s);
4714          ma = stbi__get32le(s);
4715          stbi__get32le(s); // discard color space
4716          for (i=0; i < 12; ++i)
4717             stbi__get32le(s); // discard color space parameters
4718          if (hsz == 124) {
4719             stbi__get32le(s); // discard rendering intent
4720             stbi__get32le(s); // discard offset of profile data
4721             stbi__get32le(s); // discard size of profile data
4722             stbi__get32le(s); // discard reserved
4723          }
4724       }
4725       if (bpp < 16)
4726          psize = (offset - 14 - hsz) >> 2;
4727    }
4728    s->img_n = ma ? 4 : 3;
4729    if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
4730       target = req_comp;
4731    else
4732       target = s->img_n; // if they want monochrome, we'll post-convert
4733    if (s->img_x == 0 || s->img_y == 0 || target <= 0 || target > (INT_MAX / s->img_x / s->img_y))
4734        return stbi__errpuc("Integer Overflow", "x or y incorrect");
4735    out = (stbi_uc *) stbi__malloc(target * s->img_x * s->img_y);
4736    if (!out) return stbi__errpuc("outofmem", "Out of memory");
4737    if (bpp < 16) {
4738       int z=0;
4739       if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
4740       for (i=0; i < psize; ++i) {
4741          pal[i][2] = stbi__get8(s);
4742          pal[i][1] = stbi__get8(s);
4743          pal[i][0] = stbi__get8(s);
4744          if (hsz != 12) stbi__get8(s);
4745          pal[i][3] = 255;
4746       }
4747       stbi__skip(s, offset - 14 - hsz - psize * (hsz == 12 ? 3 : 4));
4748       if (bpp == 4) width = (s->img_x + 1) >> 1;
4749       else if (bpp == 8) width = s->img_x;
4750       else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
4751       pad = (-width)&3;
4752       for (j=0; j < (int) s->img_y; ++j) {
4753          for (i=0; i < (int) s->img_x; i += 2) {
4754             int v=stbi__get8(s),v2=0;
4755             if (bpp == 4) {
4756                v2 = v & 15;
4757                v >>= 4;
4758             }
4759             out[z++] = pal[v][0];
4760             out[z++] = pal[v][1];
4761             out[z++] = pal[v][2];
4762             if (target == 4) out[z++] = 255;
4763             if (i+1 == (int) s->img_x) break;
4764             v = (bpp == 8) ? stbi__get8(s) : v2;
4765             out[z++] = pal[v][0];
4766             out[z++] = pal[v][1];
4767             out[z++] = pal[v][2];
4768             if (target == 4) out[z++] = 255;
4769          }
4770          stbi__skip(s, pad);
4771       }
4772    } else {
4773       int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
4774       int z = 0;
4775       int easy=0;
4776       stbi__skip(s, offset - 14 - hsz);
4777       if (bpp == 24) width = 3 * s->img_x;
4778       else if (bpp == 16) width = 2*s->img_x;
4779       else /* bpp = 32 and pad = 0 */ width=0;
4780       pad = (-width) & 3;
4781       if (bpp == 24) {
4782          easy = 1;
4783       } else if (bpp == 32) {
4784          if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
4785             easy = 2;
4786       }
4787       if (!easy) {
4788          if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
4789          // right shift amt to put high bit in position #7
4790          rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
4791          gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
4792          bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
4793          ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
4794       }
4795       for (j=0; j < (int) s->img_y; ++j) {
4796          if (easy) {
4797             for (i=0; i < (int) s->img_x; ++i) {
4798                unsigned char a;
4799                out[z+2] = stbi__get8(s);
4800                out[z+1] = stbi__get8(s);
4801                out[z+0] = stbi__get8(s);
4802                z += 3;
4803                a = (easy == 2 ? stbi__get8(s) : 255);
4804                all_a |= a;
4805                if (target == 4) out[z++] = a;
4806             }
4807          } else {
4808             for (i=0; i < (int) s->img_x; ++i) {
4809                stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s));
4810                int a;
4811                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
4812                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
4813                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
4814                a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
4815                all_a |= a;
4816                if (target == 4) out[z++] = STBI__BYTECAST(a);
4817             }
4818          }
4819          stbi__skip(s, pad);
4820       }
4821    }
4822 
4823    // if alpha channel is all 0s, replace with all 255s
4824    if (target == 4 && all_a == 0)
4825       for (i=4*s->img_x*s->img_y-1; i >= 0; i -= 4)
4826          out[i] = 255;
4827 
4828    if (flip_vertically) {
4829       stbi_uc t;
4830       for (j=0; j < (int) s->img_y>>1; ++j) {
4831          stbi_uc *p1 = out +      j     *s->img_x*target;
4832          stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
4833          for (i=0; i < (int) s->img_x*target; ++i) {
4834             t = p1[i], p1[i] = p2[i], p2[i] = t;
4835          }
4836       }
4837    }
4838 
4839    if (req_comp && req_comp != target) {
4840       out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
4841       if (out == NULL) return out; // stbi__convert_format frees input on failure
4842    }
4843 
4844    *x = s->img_x;
4845    *y = s->img_y;
4846    if (comp) *comp = s->img_n;
4847    return out;
4848 }
4849 #endif
4850 
4851 // Targa Truevision - TGA
4852 // by Jonathan Dummer
4853 #ifndef STBI_NO_TGA
stbi__tga_info(stbi__context * s,int * x,int * y,int * comp)4854 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
4855 {
4856     int tga_w, tga_h, tga_comp;
4857     int sz;
4858     stbi__get8(s);                   // discard Offset
4859     sz = stbi__get8(s);              // color type
4860     if( sz > 1 ) {
4861         stbi__rewind(s);
4862         return 0;      // only RGB or indexed allowed
4863     }
4864     sz = stbi__get8(s);              // image type
4865     // only RGB or grey allowed, +/- RLE
4866     if ((sz != 1) && (sz != 2) && (sz != 3) && (sz != 9) && (sz != 10) && (sz != 11)) return 0;
4867     stbi__skip(s,9);
4868     tga_w = stbi__get16le(s);
4869     if( tga_w < 1 ) {
4870         stbi__rewind(s);
4871         return 0;   // test width
4872     }
4873     tga_h = stbi__get16le(s);
4874     if( tga_h < 1 ) {
4875         stbi__rewind(s);
4876         return 0;   // test height
4877     }
4878     sz = stbi__get8(s);               // bits per pixel
4879     // only RGB or RGBA or grey allowed
4880     if ((sz != 8) && (sz != 16) && (sz != 24) && (sz != 32)) {
4881         stbi__rewind(s);
4882         return 0;
4883     }
4884     tga_comp = sz;
4885     if (x) *x = tga_w;
4886     if (y) *y = tga_h;
4887     if (comp) *comp = tga_comp / 8;
4888     return 1;                   // seems to have passed everything
4889 }
4890 
stbi__tga_test(stbi__context * s)4891 static int stbi__tga_test(stbi__context *s)
4892 {
4893    int res;
4894    int sz;
4895    stbi__get8(s);      //   discard Offset
4896    sz = stbi__get8(s);   //   color type
4897    if ( sz > 1 ) return 0;   //   only RGB or indexed allowed
4898    sz = stbi__get8(s);   //   image type
4899    if ( (sz != 1) && (sz != 2) && (sz != 3) && (sz != 9) && (sz != 10) && (sz != 11) ) return 0;   //   only RGB or grey allowed, +/- RLE
4900    stbi__get16be(s);      //   discard palette start
4901    stbi__get16be(s);      //   discard palette length
4902    stbi__get8(s);         //   discard bits per palette color entry
4903    stbi__get16be(s);      //   discard x origin
4904    stbi__get16be(s);      //   discard y origin
4905    if ( stbi__get16be(s) < 1 ) return 0;      //   test width
4906    if ( stbi__get16be(s) < 1 ) return 0;      //   test height
4907    sz = stbi__get8(s);   //   bits per pixel
4908    if ( (sz != 8) && (sz != 16) && (sz != 24) && (sz != 32) )
4909       res = 0;
4910    else
4911       res = 1;
4912    stbi__rewind(s);
4913    return res;
4914 }
4915 
stbi__tga_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)4916 static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4917 {
4918    //   read in the TGA header stuff
4919    int tga_offset = stbi__get8(s);
4920    int tga_indexed = stbi__get8(s);
4921    int tga_image_type = stbi__get8(s);
4922    int tga_is_RLE = 0;
4923    int tga_palette_start = stbi__get16le(s);
4924    int tga_palette_len = stbi__get16le(s);
4925    int tga_palette_bits = stbi__get8(s);
4926    int tga_x_origin = stbi__get16le(s);
4927    int tga_y_origin = stbi__get16le(s);
4928    int tga_width = stbi__get16le(s);
4929    int tga_height = stbi__get16le(s);
4930    int tga_bits_per_pixel = stbi__get8(s);
4931    int tga_comp = tga_bits_per_pixel / 8;
4932    int tga_inverted = stbi__get8(s);
4933    //   image data
4934    unsigned char *tga_data;
4935    unsigned char *tga_palette = NULL;
4936    int i, j;
4937    unsigned char raw_data[4];
4938    int RLE_count = 0;
4939    int RLE_repeating = 0;
4940    int read_next_pixel = 1;
4941 
4942    //   do a tiny bit of precessing
4943    if ( tga_image_type >= 8 )
4944    {
4945       tga_image_type -= 8;
4946       tga_is_RLE = 1;
4947    }
4948    /* int tga_alpha_bits = tga_inverted & 15; */
4949    tga_inverted = 1 - ((tga_inverted >> 5) & 1);
4950 
4951    //   error check
4952    if ( //(tga_indexed) ||
4953       (tga_width < 1) || (tga_height < 1) ||
4954       (tga_image_type < 1) || (tga_image_type > 3) ||
4955       ((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16) &&
4956       (tga_bits_per_pixel != 24) && (tga_bits_per_pixel != 32))
4957       )
4958    {
4959       return NULL; // we don't report this as a bad TGA because we don't even know if it's TGA
4960    }
4961 
4962    //   If I'm paletted, then I'll use the number of bits from the palette
4963    if ( tga_indexed )
4964    {
4965       tga_comp = tga_palette_bits / 8;
4966    }
4967 
4968    //   tga info
4969    *x = tga_width;
4970    *y = tga_height;
4971    if (comp) *comp = tga_comp;
4972 
4973    if(tga_width <= 0 || tga_height <= 0 || tga_comp <= 0 ||
4974            (tga_comp > INT_MAX / tga_width / tga_height))
4975        return stbi__errpuc("Integer Overflow", "TGA image width or height is too large");
4976 
4977    tga_data = (unsigned char*)stbi__malloc( (size_t)tga_width * tga_height * tga_comp );
4978    if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
4979 
4980    // skip to the data's starting position (offset usually = 0)
4981    stbi__skip(s, tga_offset );
4982 
4983    if ( !tga_indexed && !tga_is_RLE) {
4984       for (i=0; i < tga_height; ++i) {
4985          int row = tga_inverted ? tga_height -i - 1 : i;
4986          stbi_uc *tga_row = tga_data + row*tga_width*tga_comp;
4987          stbi__getn(s, tga_row, tga_width * tga_comp);
4988       }
4989    } else  {
4990       //   do I need to load a palette?
4991       if ( tga_indexed)
4992       {
4993          //   any data to skip? (offset usually = 0)
4994          stbi__skip(s, tga_palette_start );
4995          //   load the palette
4996          tga_palette = (unsigned char*)stbi__malloc( tga_palette_len * tga_palette_bits / 8 );
4997          if (!tga_palette) {
4998             STBI_FREE(tga_data);
4999             return stbi__errpuc("outofmem", "Out of memory");
5000          }
5001          if (!stbi__getn(s, tga_palette, tga_palette_len * tga_palette_bits / 8 )) {
5002             STBI_FREE(tga_data);
5003             STBI_FREE(tga_palette);
5004             return stbi__errpuc("bad palette", "Corrupt TGA");
5005          }
5006       }
5007       //   load the data
5008       for (i=0; i < tga_width * tga_height; ++i)
5009       {
5010          //   if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
5011          if ( tga_is_RLE )
5012          {
5013             if ( RLE_count == 0 )
5014             {
5015                //   yep, get the next byte as a RLE command
5016                int RLE_cmd = stbi__get8(s);
5017                RLE_count = 1 + (RLE_cmd & 127);
5018                RLE_repeating = RLE_cmd >> 7;
5019                read_next_pixel = 1;
5020             } else if ( !RLE_repeating )
5021             {
5022                read_next_pixel = 1;
5023             }
5024          } else
5025          {
5026             read_next_pixel = 1;
5027          }
5028          //   OK, if I need to read a pixel, do it now
5029          if ( read_next_pixel )
5030          {
5031             //   load however much data we did have
5032             if ( tga_indexed )
5033             {
5034                //   read in 1 byte, then perform the lookup
5035                int pal_idx = stbi__get8(s);
5036                if ( pal_idx >= tga_palette_len )
5037                {
5038                   //   invalid index
5039                   pal_idx = 0;
5040                }
5041                pal_idx *= tga_bits_per_pixel / 8;
5042                for (j = 0; j*8 < tga_bits_per_pixel; ++j)
5043                {
5044                   raw_data[j] = tga_palette[pal_idx+j];
5045                }
5046             } else
5047             {
5048                //   read in the data raw
5049                for (j = 0; j*8 < tga_bits_per_pixel; ++j)
5050                {
5051                   raw_data[j] = stbi__get8(s);
5052                }
5053             }
5054             //   clear the reading flag for the next pixel
5055             read_next_pixel = 0;
5056          } // end of reading a pixel
5057 
5058          // copy data
5059          for (j = 0; j < tga_comp; ++j)
5060            tga_data[i*tga_comp+j] = raw_data[j];
5061 
5062          //   in case we're in RLE mode, keep counting down
5063          --RLE_count;
5064       }
5065       //   do I need to invert the image?
5066       if ( tga_inverted )
5067       {
5068          for (j = 0; j*2 < tga_height; ++j)
5069          {
5070             int index1 = j * tga_width * tga_comp;
5071             int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
5072             for (i = tga_width * tga_comp; i > 0; --i)
5073             {
5074                unsigned char temp = tga_data[index1];
5075                tga_data[index1] = tga_data[index2];
5076                tga_data[index2] = temp;
5077                ++index1;
5078                ++index2;
5079             }
5080          }
5081       }
5082       //   clear my palette, if I had one
5083       if ( tga_palette != NULL )
5084       {
5085          STBI_FREE( tga_palette );
5086       }
5087    }
5088 
5089    // swap RGB
5090    if (tga_comp >= 3)
5091    {
5092       unsigned char* tga_pixel = tga_data;
5093       for (i=0; i < tga_width * tga_height; ++i)
5094       {
5095          unsigned char temp = tga_pixel[0];
5096          tga_pixel[0] = tga_pixel[2];
5097          tga_pixel[2] = temp;
5098          tga_pixel += tga_comp;
5099       }
5100    }
5101 
5102    // convert to target component count
5103    if (req_comp && req_comp != tga_comp)
5104       tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
5105 
5106    //   the things I do to get rid of an error message, and yet keep
5107    //   Microsoft's C compilers happy... [8^(
5108    tga_palette_start = tga_palette_len = tga_palette_bits =
5109          tga_x_origin = tga_y_origin = 0;
5110    //   OK, done
5111    return tga_data;
5112 }
5113 #endif
5114 
5115 // *************************************************************************************************
5116 // Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
5117 
5118 #ifndef STBI_NO_PSD
stbi__psd_test(stbi__context * s)5119 static int stbi__psd_test(stbi__context *s)
5120 {
5121    int r = (stbi__get32be(s) == 0x38425053);
5122    stbi__rewind(s);
5123    return r;
5124 }
5125 
stbi__psd_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)5126 static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
5127 {
5128    int   pixelCount;
5129    int channelCount, compression;
5130    int channel, i, count, len;
5131    int bitdepth;
5132    int w,h;
5133    stbi_uc *out;
5134 
5135    // Check identifier
5136    if (stbi__get32be(s) != 0x38425053)   // "8BPS"
5137       return stbi__errpuc("not PSD", "Corrupt PSD image");
5138 
5139    // Check file type version.
5140    if (stbi__get16be(s) != 1)
5141       return stbi__errpuc("wrong version", "Unsupported version of PSD image");
5142 
5143    // Skip 6 reserved bytes.
5144    stbi__skip(s, 6 );
5145 
5146    // Read the number of channels (R, G, B, A, etc).
5147    channelCount = stbi__get16be(s);
5148    if (channelCount < 0 || channelCount > 16)
5149       return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
5150 
5151    // Read the rows and columns of the image.
5152    h = stbi__get32be(s);
5153    w = stbi__get32be(s);
5154 
5155    // Make sure the depth is 8 bits.
5156    bitdepth = stbi__get16be(s);
5157    if (bitdepth != 8 && bitdepth != 16)
5158       return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit");
5159 
5160    // Make sure the color mode is RGB.
5161    // Valid options are:
5162    //   0: Bitmap
5163    //   1: Grayscale
5164    //   2: Indexed color
5165    //   3: RGB color
5166    //   4: CMYK color
5167    //   7: Multichannel
5168    //   8: Duotone
5169    //   9: Lab color
5170    if (stbi__get16be(s) != 3)
5171       return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
5172 
5173    // Skip the Mode Data.  (It's the palette for indexed color; other info for other modes.)
5174    stbi__skip(s,stbi__get32be(s) );
5175 
5176    // Skip the image resources.  (resolution, pen tool paths, etc)
5177    stbi__skip(s, stbi__get32be(s) );
5178 
5179    // Skip the reserved data.
5180    stbi__skip(s, stbi__get32be(s) );
5181 
5182    // Find out if the data is compressed.
5183    // Known values:
5184    //   0: no compression
5185    //   1: RLE compressed
5186    compression = stbi__get16be(s);
5187    if (compression > 1)
5188       return stbi__errpuc("bad compression", "PSD has an unknown compression format");
5189 
5190    // Create the destination image.
5191    if (w <= 0 || h <= 0 ||
5192            (4 > (INT_MAX / w / h)))
5193        return stbi__errpuc("Integer Overflow", "w or h incorrect");
5194    out = (stbi_uc *) stbi__malloc(4 * w*h);
5195    if (!out) return stbi__errpuc("outofmem", "Out of memory");
5196    pixelCount = w*h;
5197 
5198    // Initialize the data to zero.
5199    //memset( out, 0, pixelCount * 4 );
5200 
5201    // Finally, the image data.
5202    if (compression) {
5203       // RLE as used by .PSD and .TIFF
5204       // Loop until you get the number of unpacked bytes you are expecting:
5205       //     Read the next source byte into n.
5206       //     If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
5207       //     Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
5208       //     Else if n is 128, noop.
5209       // Endloop
5210 
5211       // The RLE-compressed data is preceeded by a 2-byte data count for each row in the data,
5212       // which we're going to just skip.
5213       stbi__skip(s, h * channelCount * 2 );
5214 
5215       // Read the RLE data by channel.
5216       for (channel = 0; channel < 4; channel++) {
5217          stbi_uc *p;
5218 
5219          p = out+channel;
5220          if (channel >= channelCount) {
5221             // Fill this channel with default data.
5222             for (i = 0; i < pixelCount; i++, p += 4)
5223                *p = (channel == 3 ? 255 : 0);
5224          } else {
5225             // Read the RLE data.
5226             count = 0;
5227             while (count < pixelCount) {
5228                len = stbi__get8(s);
5229                if (len == 128) {
5230                   // No-op.
5231                } else if (len < 128) {
5232                   // Copy next len+1 bytes literally.
5233                   len++;
5234                   if (len >= pixelCount - count) {
5235                      STBI_FREE(out);
5236                      return stbi__errpuc("corruptfile", "Corrupt PSD file");
5237                   }
5238                   count += len;
5239                   while (len) {
5240                      *p = stbi__get8(s);
5241                      p += 4;
5242                      len--;
5243                   }
5244                } else if (len > 128) {
5245                   stbi_uc   val;
5246                   // Next -len+1 bytes in the dest are replicated from next source byte.
5247                   // (Interpret len as a negative 8-bit int.)
5248                   len ^= 0x0FF;
5249                   len += 2;
5250                   val = stbi__get8(s);
5251                   if (len >= pixelCount - count) {
5252                      STBI_FREE(out);
5253                      return stbi__errpuc("corruptfile", "Corrupt PSD file");
5254                   }
5255                   count += len;
5256                   while (len) {
5257                      *p = val;
5258                      p += 4;
5259                      len--;
5260                   }
5261                }
5262             }
5263          }
5264       }
5265 
5266    } else {
5267       // We're at the raw image data.  It's each channel in order (Red, Green, Blue, Alpha, ...)
5268       // where each channel consists of an 8-bit value for each pixel in the image.
5269 
5270       // Read the data by channel.
5271       for (channel = 0; channel < 4; channel++) {
5272          stbi_uc *p;
5273 
5274          p = out + channel;
5275          if (channel >= channelCount) {
5276             // Fill this channel with default data.
5277             stbi_uc val = channel == 3 ? 255 : 0;
5278             for (i = 0; i < pixelCount; i++, p += 4)
5279                *p = val;
5280          } else {
5281             // Read the data.
5282             if (bitdepth == 16) {
5283                for (i = 0; i < pixelCount; i++, p += 4)
5284                   *p = (stbi_uc) (stbi__get16be(s) >> 8);
5285             } else {
5286                for (i = 0; i < pixelCount; i++, p += 4)
5287                   *p = stbi__get8(s);
5288             }
5289          }
5290       }
5291    }
5292 
5293    if (req_comp && req_comp != 4) {
5294       out = stbi__convert_format(out, 4, req_comp, w, h);
5295       if (out == NULL) return out; // stbi__convert_format frees input on failure
5296    }
5297 
5298    if (comp) *comp = 4;
5299    *y = h;
5300    *x = w;
5301 
5302    return out;
5303 }
5304 #endif
5305 
5306 // *************************************************************************************************
5307 // Softimage PIC loader
5308 // by Tom Seddon
5309 //
5310 // See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
5311 // See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
5312 
5313 #ifndef STBI_NO_PIC
stbi__pic_is4(stbi__context * s,const char * str)5314 static int stbi__pic_is4(stbi__context *s,const char *str)
5315 {
5316    int i;
5317    for (i=0; i<4; ++i)
5318       if (stbi__get8(s) != (stbi_uc)str[i])
5319          return 0;
5320 
5321    return 1;
5322 }
5323 
stbi__pic_test_core(stbi__context * s)5324 static int stbi__pic_test_core(stbi__context *s)
5325 {
5326    int i;
5327 
5328    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
5329       return 0;
5330 
5331    for(i=0;i<84;++i)
5332       stbi__get8(s);
5333 
5334    if (!stbi__pic_is4(s,"PICT"))
5335       return 0;
5336 
5337    return 1;
5338 }
5339 
5340 typedef struct
5341 {
5342    stbi_uc size,type,channel;
5343 } stbi__pic_packet;
5344 
stbi__readval(stbi__context * s,int channel,stbi_uc * dest)5345 static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
5346 {
5347    int mask=0x80, i;
5348 
5349    for (i=0; i<4; ++i, mask>>=1) {
5350       if (channel & mask) {
5351          if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
5352          dest[i]=stbi__get8(s);
5353       }
5354    }
5355 
5356    return dest;
5357 }
5358 
stbi__copyval(int channel,stbi_uc * dest,const stbi_uc * src)5359 static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
5360 {
5361    int mask=0x80,i;
5362 
5363    for (i=0;i<4; ++i, mask>>=1)
5364       if (channel&mask)
5365          dest[i]=src[i];
5366 }
5367 
stbi__pic_load_core(stbi__context * s,int width,int height,int * comp,stbi_uc * result)5368 static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
5369 {
5370    int act_comp=0,num_packets=0,y,chained;
5371    stbi__pic_packet packets[10];
5372 
5373    // this will (should...) cater for even some bizarre stuff like having data
5374     // for the same channel in multiple packets.
5375    do {
5376       stbi__pic_packet *packet;
5377 
5378       if (num_packets==sizeof(packets)/sizeof(packets[0]))
5379          return stbi__errpuc("bad format","too many packets");
5380 
5381       packet = &packets[num_packets++];
5382 
5383       chained = stbi__get8(s);
5384       packet->size    = stbi__get8(s);
5385       packet->type    = stbi__get8(s);
5386       packet->channel = stbi__get8(s);
5387 
5388       act_comp |= packet->channel;
5389 
5390       if (stbi__at_eof(s))          return stbi__errpuc("bad file","file too short (reading packets)");
5391       if (packet->size != 8)  return stbi__errpuc("bad format","packet isn't 8bpp");
5392    } while (chained);
5393 
5394    *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
5395 
5396    for(y=0; y<height; ++y) {
5397       int packet_idx;
5398 
5399       for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
5400          stbi__pic_packet *packet = &packets[packet_idx];
5401          stbi_uc *dest = result+y*width*4;
5402 
5403          switch (packet->type) {
5404             default:
5405                return stbi__errpuc("bad format","packet has bad compression type");
5406 
5407             case 0: {//uncompressed
5408                int x;
5409 
5410                for(x=0;x<width;++x, dest+=4)
5411                   if (!stbi__readval(s,packet->channel,dest))
5412                      return 0;
5413                break;
5414             }
5415 
5416             case 1://Pure RLE
5417                {
5418                   int left=width, i;
5419 
5420                   while (left>0) {
5421                      stbi_uc count,value[4];
5422 
5423                      count=stbi__get8(s);
5424                      if (stbi__at_eof(s))   return stbi__errpuc("bad file","file too short (pure read count)");
5425 
5426                      if (count > left)
5427                         count = (stbi_uc) left;
5428 
5429                      if (!stbi__readval(s,packet->channel,value))  return 0;
5430 
5431                      for(i=0; i<count; ++i,dest+=4)
5432                         stbi__copyval(packet->channel,dest,value);
5433                      left -= count;
5434                   }
5435                }
5436                break;
5437 
5438             case 2: {//Mixed RLE
5439                int left=width;
5440                while (left>0) {
5441                   int count = stbi__get8(s), i;
5442                   if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (mixed read count)");
5443 
5444                   if (count >= 128) { // Repeated
5445                      stbi_uc value[4];
5446 
5447                      if (count==128)
5448                         count = stbi__get16be(s);
5449                      else
5450                         count -= 127;
5451                      if (count > left)
5452                         return stbi__errpuc("bad file","scanline overrun");
5453 
5454                      if (!stbi__readval(s,packet->channel,value))
5455                         return 0;
5456 
5457                      for(i=0;i<count;++i, dest += 4)
5458                         stbi__copyval(packet->channel,dest,value);
5459                   } else { // Raw
5460                      ++count;
5461                      if (count>left) return stbi__errpuc("bad file","scanline overrun");
5462 
5463                      for(i=0;i<count;++i, dest+=4)
5464                         if (!stbi__readval(s,packet->channel,dest))
5465                            return 0;
5466                   }
5467                   left-=count;
5468                }
5469                break;
5470             }
5471          }
5472       }
5473    }
5474 
5475    return result;
5476 }
5477 
stbi__pic_load(stbi__context * s,int * px,int * py,int * comp,int req_comp)5478 static stbi_uc *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp)
5479 {
5480    stbi_uc *result;
5481    int i, x,y;
5482 
5483    for (i=0; i<92; ++i)
5484       stbi__get8(s);
5485 
5486    x = stbi__get16be(s);
5487    y = stbi__get16be(s);
5488    if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (pic header)");
5489    if ((1 << 28) / x < y) return stbi__errpuc("too large", "Image too large to decode");
5490 
5491    stbi__get32be(s); //skip `ratio'
5492    stbi__get16be(s); //skip `fields'
5493    stbi__get16be(s); //skip `pad'
5494 
5495    if (x <= 0 || y <= 0 ||
5496            (4 > (INT_MAX / x / y)))
5497        return stbi__errpuc("Integer Overflow", "x or y incorrect");
5498    // intermediate buffer is RGBA
5499    result = (stbi_uc *) stbi__malloc(x*y*4);
5500    if(result == NULL) return stbi__errpuc("outofmem", "Out of memory");
5501    memset(result, 0xff, x*y*4);
5502 
5503    if (!stbi__pic_load_core(s,x,y,comp, result)) {
5504       STBI_FREE(result);
5505       result=0;
5506    }
5507    *px = x;
5508    *py = y;
5509    if (req_comp == 0) req_comp = *comp;
5510    result=stbi__convert_format(result,4,req_comp,x,y);
5511 
5512    return result;
5513 }
5514 
stbi__pic_test(stbi__context * s)5515 static int stbi__pic_test(stbi__context *s)
5516 {
5517    int r = stbi__pic_test_core(s);
5518    stbi__rewind(s);
5519    return r;
5520 }
5521 #endif
5522 
5523 // *************************************************************************************************
5524 // GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
5525 
5526 #ifndef STBI_NO_GIF
5527 typedef struct
5528 {
5529    stbi__int16 prefix;
5530    stbi_uc first;
5531    stbi_uc suffix;
5532 } stbi__gif_lzw;
5533 
5534 typedef struct
5535 {
5536    int w,h;
5537    stbi_uc *out, *old_out;             // output buffer (always 4 components)
5538    int flags, bgindex, ratio, transparent, eflags, delay;
5539    stbi_uc  pal[256][4];
5540    stbi_uc lpal[256][4];
5541    stbi__gif_lzw codes[4096];
5542    stbi_uc *color_table;
5543    int parse, step;
5544    int lflags;
5545    int start_x, start_y;
5546    int max_x, max_y;
5547    int cur_x, cur_y;
5548    int line_size;
5549 } stbi__gif;
5550 
stbi__gif_test_raw(stbi__context * s)5551 static int stbi__gif_test_raw(stbi__context *s)
5552 {
5553    int sz;
5554    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
5555    sz = stbi__get8(s);
5556    if (sz != '9' && sz != '7') return 0;
5557    if (stbi__get8(s) != 'a') return 0;
5558    return 1;
5559 }
5560 
stbi__gif_test(stbi__context * s)5561 static int stbi__gif_test(stbi__context *s)
5562 {
5563    int r = stbi__gif_test_raw(s);
5564    stbi__rewind(s);
5565    return r;
5566 }
5567 
stbi__gif_parse_colortable(stbi__context * s,stbi_uc pal[256][4],int num_entries,int transp)5568 static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
5569 {
5570    int i;
5571    for (i=0; i < num_entries; ++i) {
5572       pal[i][2] = stbi__get8(s);
5573       pal[i][1] = stbi__get8(s);
5574       pal[i][0] = stbi__get8(s);
5575       pal[i][3] = transp == i ? 0 : 255;
5576    }
5577 }
5578 
stbi__gif_header(stbi__context * s,stbi__gif * g,int * comp,int is_info)5579 static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
5580 {
5581    stbi_uc version;
5582    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
5583       return stbi__err("not GIF", "Corrupt GIF");
5584 
5585    version = stbi__get8(s);
5586    if (version != '7' && version != '9')    return stbi__err("not GIF", "Corrupt GIF");
5587    if (stbi__get8(s) != 'a')                return stbi__err("not GIF", "Corrupt GIF");
5588 
5589    stbi__g_failure_reason = "";
5590    g->w = stbi__get16le(s);
5591    g->h = stbi__get16le(s);
5592    g->flags = stbi__get8(s);
5593    g->bgindex = stbi__get8(s);
5594    g->ratio = stbi__get8(s);
5595    g->transparent = -1;
5596 
5597    if (comp != 0) *comp = 4;  // can't actually tell whether it's 3 or 4 until we parse the comments
5598 
5599    if (is_info) return 1;
5600 
5601    if (g->flags & 0x80)
5602       stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
5603 
5604    return 1;
5605 }
5606 
stbi__gif_info_raw(stbi__context * s,int * x,int * y,int * comp)5607 static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
5608 {
5609    stbi__gif g;
5610    if (!stbi__gif_header(s, &g, comp, 1)) {
5611       stbi__rewind( s );
5612       return 0;
5613    }
5614    if (x) *x = g.w;
5615    if (y) *y = g.h;
5616    return 1;
5617 }
5618 
stbi__out_gif_code(stbi__gif * g,stbi__uint16 code)5619 static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
5620 {
5621    stbi_uc *p, *c;
5622 
5623    // recurse to decode the prefixes, since the linked-list is backwards,
5624    // and working backwards through an interleaved image would be nasty
5625    if (g->codes[code].prefix >= 0)
5626       stbi__out_gif_code(g, g->codes[code].prefix);
5627 
5628    if (g->cur_y >= g->max_y) return;
5629 
5630    p = &g->out[g->cur_x + g->cur_y];
5631    c = &g->color_table[g->codes[code].suffix * 4];
5632 
5633    if (c[3] >= 128) {
5634       p[0] = c[2];
5635       p[1] = c[1];
5636       p[2] = c[0];
5637       p[3] = c[3];
5638    }
5639    g->cur_x += 4;
5640 
5641    if (g->cur_x >= g->max_x) {
5642       g->cur_x = g->start_x;
5643       g->cur_y += g->step;
5644 
5645       while (g->cur_y >= g->max_y && g->parse > 0) {
5646          g->step = (1 << g->parse) * g->line_size;
5647          g->cur_y = g->start_y + (g->step >> 1);
5648          --g->parse;
5649       }
5650    }
5651 }
5652 
stbi__process_gif_raster(stbi__context * s,stbi__gif * g)5653 static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
5654 {
5655    stbi_uc lzw_cs;
5656    stbi__int32 len, init_code;
5657    stbi__uint32 first;
5658    stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
5659    stbi__gif_lzw *p;
5660 
5661    lzw_cs = stbi__get8(s);
5662    if (lzw_cs > 12) return NULL;
5663    clear = 1 << lzw_cs;
5664    first = 1;
5665    codesize = lzw_cs + 1;
5666    codemask = (1 << codesize) - 1;
5667    bits = 0;
5668    valid_bits = 0;
5669    for (init_code = 0; init_code < clear; init_code++) {
5670       g->codes[init_code].prefix = -1;
5671       g->codes[init_code].first = (stbi_uc) init_code;
5672       g->codes[init_code].suffix = (stbi_uc) init_code;
5673    }
5674 
5675    // support no starting clear code
5676    avail = clear+2;
5677    oldcode = -1;
5678 
5679    len = 0;
5680    for(;;) {
5681       if (valid_bits < codesize) {
5682          if (len == 0) {
5683             len = stbi__get8(s); // start new block
5684             if (len == 0)
5685                return g->out;
5686          }
5687          --len;
5688          bits |= (stbi__int32) stbi__get8(s) << valid_bits;
5689          valid_bits += 8;
5690       } else {
5691          stbi__int32 code = bits & codemask;
5692          bits >>= codesize;
5693          valid_bits -= codesize;
5694          // @OPTIMIZE: is there some way we can accelerate the non-clear path?
5695          if (code == clear) {  // clear code
5696             codesize = lzw_cs + 1;
5697             codemask = (1 << codesize) - 1;
5698             avail = clear + 2;
5699             oldcode = -1;
5700             first = 0;
5701          } else if (code == clear + 1) { // end of stream code
5702             stbi__skip(s, len);
5703             while ((len = stbi__get8(s)) > 0)
5704                stbi__skip(s,len);
5705             return g->out;
5706          } else if (code <= avail) {
5707             if (first) return stbi__errpuc("no clear code", "Corrupt GIF");
5708 
5709             if (oldcode >= 0) {
5710                p = &g->codes[avail++];
5711                if (avail > 4096)        return stbi__errpuc("too many codes", "Corrupt GIF");
5712                p->prefix = (stbi__int16) oldcode;
5713                p->first = g->codes[oldcode].first;
5714                p->suffix = (code == avail) ? p->first : g->codes[code].first;
5715             } else if (code == avail)
5716                return stbi__errpuc("illegal code in raster", "Corrupt GIF");
5717 
5718             stbi__out_gif_code(g, (stbi__uint16) code);
5719 
5720             if ((avail & codemask) == 0 && avail <= 0x0FFF) {
5721                codesize++;
5722                codemask = (1 << codesize) - 1;
5723             }
5724 
5725             oldcode = code;
5726          } else {
5727             return stbi__errpuc("illegal code in raster", "Corrupt GIF");
5728          }
5729       }
5730    }
5731 }
5732 
stbi__fill_gif_background(stbi__gif * g,int x0,int y0,int x1,int y1)5733 static void stbi__fill_gif_background(stbi__gif *g, int x0, int y0, int x1, int y1)
5734 {
5735    int x, y;
5736    stbi_uc *c = g->pal[g->bgindex];
5737    for (y = y0; y < y1; y += 4 * g->w) {
5738       for (x = x0; x < x1; x += 4) {
5739          stbi_uc *p  = &g->out[y + x];
5740          p[0] = c[2];
5741          p[1] = c[1];
5742          p[2] = c[0];
5743          p[3] = 0;
5744       }
5745    }
5746 }
5747 
5748 // this function is designed to support animated gifs, although stb_image doesn't support it
stbi__gif_load_next(stbi__context * s,stbi__gif * g,int * comp,int req_comp)5749 static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp)
5750 {
5751    int i;
5752    stbi_uc *prev_out = 0;
5753 
5754    if (g->out == 0 && !stbi__gif_header(s, g, comp,0))
5755       return 0; // stbi__g_failure_reason set by stbi__gif_header
5756 
5757    if(g->w <= 0 || g->h <= 0 ||
5758            (4 > (INT_MAX / g->w / g->h)))
5759        return stbi__errpuc("Integer Overflow", "width or height too big");
5760 
5761    prev_out = g->out;
5762    g->out = (stbi_uc *) stbi__malloc(4 * g->w * g->h);
5763    if (g->out == 0) return stbi__errpuc("outofmem", "Out of memory");
5764 
5765    switch ((g->eflags & 0x1C) >> 2) {
5766       case 0: // unspecified (also always used on 1st frame)
5767          stbi__fill_gif_background(g, 0, 0, 4 * g->w, 4 * g->w * g->h);
5768          break;
5769       case 1: // do not dispose
5770          if (prev_out) memcpy(g->out, prev_out, 4 * g->w * g->h);
5771          g->old_out = prev_out;
5772          break;
5773       case 2: // dispose to background
5774          if (prev_out) memcpy(g->out, prev_out, 4 * g->w * g->h);
5775          stbi__fill_gif_background(g, g->start_x, g->start_y, g->max_x, g->max_y);
5776          break;
5777       case 3: // dispose to previous
5778          if (g->old_out) {
5779             for (i = g->start_y; i < g->max_y; i += 4 * g->w)
5780                memcpy(&g->out[i + g->start_x], &g->old_out[i + g->start_x], g->max_x - g->start_x);
5781          }
5782          break;
5783    }
5784 
5785    for (;;) {
5786       switch (stbi__get8(s)) {
5787          case 0x2C: /* Image Descriptor */
5788          {
5789             int prev_trans = -1;
5790             stbi__int32 x, y, w, h;
5791             stbi_uc *o;
5792 
5793             x = stbi__get16le(s);
5794             y = stbi__get16le(s);
5795             w = stbi__get16le(s);
5796             h = stbi__get16le(s);
5797             if (((x + w) > (g->w)) || ((y + h) > (g->h)))
5798                return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
5799 
5800             g->line_size = g->w * 4;
5801             g->start_x = x * 4;
5802             g->start_y = y * g->line_size;
5803             g->max_x   = g->start_x + w * 4;
5804             g->max_y   = g->start_y + h * g->line_size;
5805             g->cur_x   = g->start_x;
5806             g->cur_y   = g->start_y;
5807 
5808             g->lflags = stbi__get8(s);
5809 
5810             if (g->lflags & 0x40) {
5811                g->step = 8 * g->line_size; // first interlaced spacing
5812                g->parse = 3;
5813             } else {
5814                g->step = g->line_size;
5815                g->parse = 0;
5816             }
5817 
5818             if (g->lflags & 0x80) {
5819                stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
5820                g->color_table = (stbi_uc *) g->lpal;
5821             } else if (g->flags & 0x80) {
5822                if (g->transparent >= 0 && (g->eflags & 0x01)) {
5823                   prev_trans = g->pal[g->transparent][3];
5824                   g->pal[g->transparent][3] = 0;
5825                }
5826                g->color_table = (stbi_uc *) g->pal;
5827             } else
5828                return stbi__errpuc("missing color table", "Corrupt GIF");
5829 
5830             o = stbi__process_gif_raster(s, g);
5831             if (o == NULL) return NULL;
5832 
5833             if (prev_trans != -1)
5834                g->pal[g->transparent][3] = (stbi_uc) prev_trans;
5835 
5836             return o;
5837          }
5838 
5839          case 0x21: // Comment Extension.
5840          {
5841             int len;
5842             if (stbi__get8(s) == 0xF9) { // Graphic Control Extension.
5843                len = stbi__get8(s);
5844                if (len == 4) {
5845                   g->eflags = stbi__get8(s);
5846                   g->delay = stbi__get16le(s);
5847                   g->transparent = stbi__get8(s);
5848                } else {
5849                   stbi__skip(s, len);
5850                   break;
5851                }
5852             }
5853             while ((len = stbi__get8(s)) != 0)
5854                stbi__skip(s, len);
5855             break;
5856          }
5857 
5858          case 0x3B: // gif stream termination code
5859             return (stbi_uc *) s; // using '1' causes warning on some compilers
5860 
5861          default:
5862             return stbi__errpuc("unknown code", "Corrupt GIF");
5863       }
5864    }
5865 
5866    STBI_NOTUSED(req_comp);
5867 }
5868 
stbi__gif_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)5869 static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
5870 {
5871    stbi_uc *u = 0;
5872    stbi__gif g;
5873    memset(&g, 0, sizeof(g));
5874 
5875    u = stbi__gif_load_next(s, &g, comp, req_comp);
5876    if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
5877    if (u) {
5878       *x = g.w;
5879       *y = g.h;
5880       if (req_comp && req_comp != 4)
5881          u = stbi__convert_format(u, 4, req_comp, g.w, g.h);
5882    }
5883    else if (g.out)
5884       STBI_FREE(g.out);
5885 
5886    return u;
5887 }
5888 
stbi__gif_info(stbi__context * s,int * x,int * y,int * comp)5889 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
5890 {
5891    return stbi__gif_info_raw(s,x,y,comp);
5892 }
5893 #endif
5894 
5895 // *************************************************************************************************
5896 // Radiance RGBE HDR loader
5897 // originally by Nicolas Schulz
5898 #ifndef STBI_NO_HDR
stbi__hdr_test_core(stbi__context * s)5899 static int stbi__hdr_test_core(stbi__context *s)
5900 {
5901    const char *signature = "#?RADIANCE\n";
5902    int i;
5903    for (i=0; signature[i]; ++i)
5904       if (stbi__get8(s) != signature[i])
5905          return 0;
5906    return 1;
5907 }
5908 
stbi__hdr_test(stbi__context * s)5909 static int stbi__hdr_test(stbi__context* s)
5910 {
5911    int r = stbi__hdr_test_core(s);
5912    stbi__rewind(s);
5913    return r;
5914 }
5915 
5916 #define STBI__HDR_BUFLEN  1024
stbi__hdr_gettoken(stbi__context * z,char * buffer)5917 static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
5918 {
5919    int len=0;
5920    char c = '\0';
5921 
5922    c = (char) stbi__get8(z);
5923 
5924    while (!stbi__at_eof(z) && c != '\n') {
5925       buffer[len++] = c;
5926       if (len == STBI__HDR_BUFLEN-1) {
5927          // flush to end of line
5928          while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
5929             ;
5930          break;
5931       }
5932       c = (char) stbi__get8(z);
5933    }
5934 
5935    buffer[len] = 0;
5936    return buffer;
5937 }
5938 
stbi__hdr_convert(float * output,stbi_uc * input,int req_comp)5939 static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
5940 {
5941    if ( input[3] != 0 ) {
5942       float f1;
5943       // Exponent
5944       f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
5945       if (req_comp <= 2)
5946          output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
5947       else {
5948          output[0] = input[0] * f1;
5949          output[1] = input[1] * f1;
5950          output[2] = input[2] * f1;
5951       }
5952       if (req_comp == 2) output[1] = 1;
5953       if (req_comp == 4) output[3] = 1;
5954    } else {
5955       switch (req_comp) {
5956          case 4: output[3] = 1; /* fallthrough */
5957          case 3: output[0] = output[1] = output[2] = 0;
5958                  break;
5959          case 2: output[1] = 1; /* fallthrough */
5960          case 1: output[0] = 0;
5961                  break;
5962       }
5963    }
5964 }
5965 
stbi__hdr_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)5966 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
5967 {
5968    char buffer[STBI__HDR_BUFLEN];
5969    char *token;
5970    int valid = 0;
5971    int width, height;
5972    stbi_uc *scanline;
5973    float *hdr_data;
5974    int len;
5975    unsigned char count, value;
5976    int i, j, k, c1,c2, z;
5977 
5978 
5979    // Check identifier
5980    if (strcmp(stbi__hdr_gettoken(s,buffer), "#?RADIANCE") != 0)
5981       return stbi__errpf("not HDR", "Corrupt HDR image");
5982 
5983    // Parse header
5984    for(;;) {
5985       token = stbi__hdr_gettoken(s,buffer);
5986       if (token[0] == 0) break;
5987       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
5988    }
5989 
5990    if (!valid)    return stbi__errpf("unsupported format", "Unsupported HDR format");
5991 
5992    // Parse width and height
5993    // can't use sscanf() if we're not using stdio!
5994    token = stbi__hdr_gettoken(s,buffer);
5995    if (strncmp(token, "-Y ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
5996    token += 3;
5997    height = (int) strtol(token, &token, 10);
5998    while (*token == ' ') ++token;
5999    if (strncmp(token, "+X ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
6000    token += 3;
6001    width = (int) strtol(token, NULL, 10);
6002 
6003    *x = width;
6004    *y = height;
6005 
6006    if (comp) *comp = 3;
6007    if (req_comp == 0) req_comp = 3;
6008 
6009    if (height <= 0 || width <= 0 || req_comp <= 0 ||
6010            (sizeof(float) > (INT_MAX / req_comp / height / width)))
6011        return stbi__errpf("Integer Overflow", "w or h incorrect");
6012    // Read data
6013    hdr_data = (float *) stbi__malloc(height * width * req_comp * sizeof(float));
6014    if (hdr_data == NULL) return stbi__errpf("outofmem", "Out of memory");
6015 
6016    // Load image data
6017    // image data is stored as some number of sca
6018    if ( width < 8 || width >= 32768) {
6019       // Read flat data
6020       for (j=0; j < height; ++j) {
6021          for (i=0; i < width; ++i) {
6022             stbi_uc rgbe[4];
6023            main_decode_loop:
6024             stbi__getn(s, rgbe, 4);
6025             stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
6026          }
6027       }
6028    } else {
6029       // Read RLE-encoded data
6030       scanline = NULL;
6031 
6032       for (j = 0; j < height; ++j) {
6033          c1 = stbi__get8(s);
6034          c2 = stbi__get8(s);
6035          len = stbi__get8(s);
6036          if (c1 != 2 || c2 != 2 || (len & 0x80)) {
6037             // not run-length encoded, so we have to actually use THIS data as a decoded
6038             // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
6039             stbi_uc rgbe[4];
6040             rgbe[0] = (stbi_uc) c1;
6041             rgbe[1] = (stbi_uc) c2;
6042             rgbe[2] = (stbi_uc) len;
6043             rgbe[3] = (stbi_uc) stbi__get8(s);
6044             stbi__hdr_convert(hdr_data, rgbe, req_comp);
6045             i = 1;
6046             j = 0;
6047             STBI_FREE(scanline);
6048             goto main_decode_loop; // yes, this makes no sense
6049          }
6050          len <<= 8;
6051          len |= stbi__get8(s);
6052          if (len != width) {
6053             STBI_FREE(hdr_data);
6054             STBI_FREE(scanline);
6055             return stbi__errpf("invalid decoded scanline length", "corrupt HDR");
6056          }
6057          if (scanline == NULL) scanline = (stbi_uc *) stbi__malloc(width * 4);
6058 
6059          for (k = 0; k < 4; ++k) {
6060             i = 0;
6061             while (i < width) {
6062                count = stbi__get8(s);
6063                if (count > 128) {
6064                   // Run
6065                   value = stbi__get8(s);
6066                   count -= 128;
6067                   if (count >= width - i) {
6068                      STBI_FREE(hdr_data);
6069                      STBI_FREE(scanline);
6070                      return stbi__errpf("invalid buffer size", "corrupt HDR");
6071                   }
6072                   for (z = 0; z < count; ++z)
6073                      scanline[i++ * 4 + k] = value;
6074                } else {
6075                   if (count >= width - i) {
6076                      STBI_FREE(hdr_data);
6077                      STBI_FREE(scanline);
6078                      return stbi__errpf("invalid buffer size", "corrupt HDR");
6079                   }
6080                   // Dump
6081                   for (z = 0; z < count; ++z)
6082                      scanline[i++ * 4 + k] = stbi__get8(s);
6083                }
6084             }
6085          }
6086          for (i=0; i < width; ++i)
6087             stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
6088       }
6089       STBI_FREE(scanline);
6090    }
6091 
6092    return hdr_data;
6093 }
6094 
stbi__hdr_info(stbi__context * s,int * x,int * y,int * comp)6095 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
6096 {
6097    char buffer[STBI__HDR_BUFLEN];
6098    char *token;
6099    int valid = 0;
6100 
6101    if (strcmp(stbi__hdr_gettoken(s,buffer), "#?RADIANCE") != 0) {
6102        stbi__rewind( s );
6103        return 0;
6104    }
6105 
6106    for(;;) {
6107       token = stbi__hdr_gettoken(s,buffer);
6108       if (token[0] == 0) break;
6109       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
6110    }
6111 
6112    if (!valid) {
6113        stbi__rewind( s );
6114        return 0;
6115    }
6116    token = stbi__hdr_gettoken(s,buffer);
6117    if (strncmp(token, "-Y ", 3)) {
6118        stbi__rewind( s );
6119        return 0;
6120    }
6121    token += 3;
6122    *y = (int) strtol(token, &token, 10);
6123    while (*token == ' ') ++token;
6124    if (strncmp(token, "+X ", 3)) {
6125        stbi__rewind( s );
6126        return 0;
6127    }
6128    token += 3;
6129    *x = (int) strtol(token, NULL, 10);
6130    *comp = 3;
6131    return 1;
6132 }
6133 #endif // STBI_NO_HDR
6134 
6135 #ifndef STBI_NO_BMP
stbi__bmp_info(stbi__context * s,int * x,int * y,int * comp)6136 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
6137 {
6138    int hsz;
6139    if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') {
6140        stbi__rewind( s );
6141        return 0;
6142    }
6143    stbi__skip(s,12);
6144    hsz = stbi__get32le(s);
6145    if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) {
6146        stbi__rewind( s );
6147        return 0;
6148    }
6149    if (hsz == 12) {
6150       *x = stbi__get16le(s);
6151       *y = stbi__get16le(s);
6152    } else {
6153       *x = stbi__get32le(s);
6154       *y = stbi__get32le(s);
6155    }
6156    if (stbi__get16le(s) != 1) {
6157        stbi__rewind( s );
6158        return 0;
6159    }
6160    *comp = stbi__get16le(s) / 8;
6161    return 1;
6162 }
6163 #endif
6164 
6165 #ifndef STBI_NO_PSD
stbi__psd_info(stbi__context * s,int * x,int * y,int * comp)6166 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
6167 {
6168    int channelCount;
6169    if (stbi__get32be(s) != 0x38425053) {
6170        stbi__rewind( s );
6171        return 0;
6172    }
6173    if (stbi__get16be(s) != 1) {
6174        stbi__rewind( s );
6175        return 0;
6176    }
6177    stbi__skip(s, 6);
6178    channelCount = stbi__get16be(s);
6179    if (channelCount < 0 || channelCount > 16) {
6180        stbi__rewind( s );
6181        return 0;
6182    }
6183    *y = stbi__get32be(s);
6184    *x = stbi__get32be(s);
6185    if (stbi__get16be(s) != 8) {
6186        stbi__rewind( s );
6187        return 0;
6188    }
6189    if (stbi__get16be(s) != 3) {
6190        stbi__rewind( s );
6191        return 0;
6192    }
6193    *comp = 4;
6194    return 1;
6195 }
6196 #endif
6197 
6198 #ifndef STBI_NO_PIC
stbi__pic_info(stbi__context * s,int * x,int * y,int * comp)6199 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
6200 {
6201    int act_comp=0,num_packets=0,chained;
6202    stbi__pic_packet packets[10];
6203 
6204    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34")) {
6205       stbi__rewind(s);
6206       return 0;
6207    }
6208 
6209    stbi__skip(s, 88);
6210 
6211    *x = stbi__get16be(s);
6212    *y = stbi__get16be(s);
6213    if (stbi__at_eof(s)) {
6214       stbi__rewind( s);
6215       return 0;
6216    }
6217    if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
6218       stbi__rewind( s );
6219       return 0;
6220    }
6221 
6222    stbi__skip(s, 8);
6223 
6224    do {
6225       stbi__pic_packet *packet;
6226 
6227       if (num_packets==sizeof(packets)/sizeof(packets[0]))
6228          return 0;
6229 
6230       packet = &packets[num_packets++];
6231       chained = stbi__get8(s);
6232       packet->size    = stbi__get8(s);
6233       packet->type    = stbi__get8(s);
6234       packet->channel = stbi__get8(s);
6235       act_comp |= packet->channel;
6236 
6237       if (stbi__at_eof(s)) {
6238           stbi__rewind( s );
6239           return 0;
6240       }
6241       if (packet->size != 8) {
6242           stbi__rewind( s );
6243           return 0;
6244       }
6245    } while (chained);
6246 
6247    *comp = (act_comp & 0x10 ? 4 : 3);
6248 
6249    return 1;
6250 }
6251 #endif
6252 
6253 // *************************************************************************************************
6254 // Portable Gray Map and Portable Pixel Map loader
6255 // by Ken Miller
6256 //
6257 // PGM: http://netpbm.sourceforge.net/doc/pgm.html
6258 // PPM: http://netpbm.sourceforge.net/doc/ppm.html
6259 //
6260 // Known limitations:
6261 //    Does not support comments in the header section
6262 //    Does not support ASCII image data (formats P2 and P3)
6263 //    Does not support 16-bit-per-channel
6264 
6265 #ifndef STBI_NO_PNM
6266 
stbi__pnm_test(stbi__context * s)6267 static int      stbi__pnm_test(stbi__context *s)
6268 {
6269    char p, t;
6270    p = (char) stbi__get8(s);
6271    t = (char) stbi__get8(s);
6272    if (p != 'P' || (t != '5' && t != '6')) {
6273        stbi__rewind( s );
6274        return 0;
6275    }
6276    return 1;
6277 }
6278 
stbi__pnm_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)6279 static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
6280 {
6281    stbi_uc *out;
6282    if (!stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n))
6283       return 0;
6284 
6285    *x = s->img_x;
6286    *y = s->img_y;
6287 
6288    if (*x <= 0 || *y <= 0)
6289       return stbi__errpuc("Integer overflow", "img_x or img_y incorrect");
6290 
6291    *comp = s->img_n;
6292    if (s->img_x == 0 || s->img_y == 0 || s->img_n <= 0
6293          || (s->img_n > (INT_MAX / s->img_x / s->img_y)))
6294       return stbi__errpuc("Integer Overflow", "x or y incorrect");
6295    out = (stbi_uc *) stbi__malloc(s->img_n * s->img_x * s->img_y);
6296    if (!out) return stbi__errpuc("outofmem", "Out of memory");
6297    stbi__getn(s, out, s->img_n * s->img_x * s->img_y);
6298 
6299    if (req_comp && req_comp != s->img_n) {
6300       out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
6301       if (out == NULL) return out; // stbi__convert_format frees input on failure
6302    }
6303    return out;
6304 }
6305 
stbi__pnm_isspace(char c)6306 static int      stbi__pnm_isspace(char c)
6307 {
6308    return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
6309 }
6310 
stbi__pnm_skip_whitespace(stbi__context * s,char * c)6311 static void     stbi__pnm_skip_whitespace(stbi__context *s, char *c)
6312 {
6313    while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
6314       *c = (char) stbi__get8(s);
6315 }
6316 
stbi__pnm_isdigit(char c)6317 static int      stbi__pnm_isdigit(char c)
6318 {
6319    return c >= '0' && c <= '9';
6320 }
6321 
stbi__pnm_getinteger(stbi__context * s,char * c)6322 static int      stbi__pnm_getinteger(stbi__context *s, char *c)
6323 {
6324    int value = 0;
6325 
6326    while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
6327       value = value*10 + (*c - '0');
6328       *c = (char) stbi__get8(s);
6329    }
6330 
6331    return value;
6332 }
6333 
stbi__pnm_info(stbi__context * s,int * x,int * y,int * comp)6334 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
6335 {
6336    int maxv;
6337    char c, p, t;
6338 
6339    stbi__rewind( s );
6340 
6341    // Get identifier
6342    p = (char) stbi__get8(s);
6343    t = (char) stbi__get8(s);
6344    if (p != 'P' || (t != '5' && t != '6')) {
6345        stbi__rewind( s );
6346        return 0;
6347    }
6348 
6349    *comp = (t == '6') ? 3 : 1;  // '5' is 1-component .pgm; '6' is 3-component .ppm
6350 
6351    c = (char) stbi__get8(s);
6352    stbi__pnm_skip_whitespace(s, &c);
6353 
6354    *x = stbi__pnm_getinteger(s, &c); // read width
6355    stbi__pnm_skip_whitespace(s, &c);
6356 
6357    *y = stbi__pnm_getinteger(s, &c); // read height
6358    stbi__pnm_skip_whitespace(s, &c);
6359 
6360    maxv = stbi__pnm_getinteger(s, &c);  // read max value
6361 
6362    if (maxv > 255)
6363       return stbi__err("max value > 255", "PPM image not 8-bit");
6364    else
6365       return 1;
6366 }
6367 #endif
6368 
stbi__info_main(stbi__context * s,int * x,int * y,int * comp)6369 static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
6370 {
6371    #ifndef STBI_NO_JPEG
6372    if (stbi__jpeg_info(s, x, y, comp)) return 1;
6373    #endif
6374 
6375    #ifndef STBI_NO_PNG
6376    if (stbi__png_info(s, x, y, comp))  return 1;
6377    #endif
6378 
6379    #ifndef STBI_NO_GIF
6380    if (stbi__gif_info(s, x, y, comp))  return 1;
6381    #endif
6382 
6383    #ifndef STBI_NO_BMP
6384    if (stbi__bmp_info(s, x, y, comp))  return 1;
6385    #endif
6386 
6387    #ifndef STBI_NO_PSD
6388    if (stbi__psd_info(s, x, y, comp))  return 1;
6389    #endif
6390 
6391    #ifndef STBI_NO_PIC
6392    if (stbi__pic_info(s, x, y, comp))  return 1;
6393    #endif
6394 
6395    #ifndef STBI_NO_PNM
6396    if (stbi__pnm_info(s, x, y, comp))  return 1;
6397    #endif
6398 
6399    #ifndef STBI_NO_HDR
6400    if (stbi__hdr_info(s, x, y, comp))  return 1;
6401    #endif
6402 
6403    // test tga last because it's a crappy test!
6404    #ifndef STBI_NO_TGA
6405    if (stbi__tga_info(s, x, y, comp))
6406        return 1;
6407    #endif
6408    return stbi__err("unknown image type", "Image not of any known type, or corrupt");
6409 }
6410 
6411 #ifndef STBI_NO_STDIO
stbi_info(char const * filename,int * x,int * y,int * comp)6412 STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
6413 {
6414     FILE *f = stbi__fopen(filename, "rb");
6415     int result;
6416     if (!f) return stbi__err("can't fopen", "Unable to open file");
6417     result = stbi_info_from_file(f, x, y, comp);
6418     fclose(f);
6419     return result;
6420 }
6421 
stbi_info_from_file(FILE * f,int * x,int * y,int * comp)6422 STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
6423 {
6424    int r;
6425    stbi__context s;
6426    long pos = ftell(f);
6427    stbi__start_file(&s, f);
6428    r = stbi__info_main(&s,x,y,comp);
6429    fseek(f,pos,SEEK_SET);
6430    return r;
6431 }
6432 #endif // !STBI_NO_STDIO
6433 
stbi_info_from_memory(stbi_uc const * buffer,int len,int * x,int * y,int * comp)6434 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
6435 {
6436    stbi__context s;
6437    stbi__start_mem(&s,buffer,len);
6438    return stbi__info_main(&s,x,y,comp);
6439 }
6440 
stbi_info_from_callbacks(stbi_io_callbacks const * c,void * user,int * x,int * y,int * comp)6441 STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
6442 {
6443    stbi__context s;
6444    stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
6445    return stbi__info_main(&s,x,y,comp);
6446 }
6447 
6448 #endif // STB_IMAGE_IMPLEMENTATION
6449 
6450 /*
6451    revision history:
6452       2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
6453       2.07  (2015-09-13) fix compiler warnings
6454                          partial animated GIF support
6455                          limited 16-bit PSD support
6456                          #ifdef unused functions
6457                          bug with < 92 byte PIC,PNM,HDR,TGA
6458       2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
6459       2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
6460       2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
6461       2.03  (2015-04-12) extra corruption checking (mmozeiko)
6462                          stbi_set_flip_vertically_on_load (nguillemot)
6463                          fix NEON support; fix mingw support
6464       2.02  (2015-01-19) fix incorrect assert, fix warning
6465       2.01  (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
6466       2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
6467       2.00  (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
6468                          progressive JPEG (stb)
6469                          PGM/PPM support (Ken Miller)
6470                          STBI_MALLOC,STBI_REALLOC,STBI_FREE
6471                          GIF bugfix -- seemingly never worked
6472                          STBI_NO_*, STBI_ONLY_*
6473       1.48  (2014-12-14) fix incorrectly-named assert()
6474       1.47  (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
6475                          optimize PNG (ryg)
6476                          fix bug in interlaced PNG with user-specified channel count (stb)
6477       1.46  (2014-08-26)
6478               fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
6479       1.45  (2014-08-16)
6480               fix MSVC-ARM internal compiler error by wrapping malloc
6481       1.44  (2014-08-07)
6482               various warning fixes from Ronny Chevalier
6483       1.43  (2014-07-15)
6484               fix MSVC-only compiler problem in code changed in 1.42
6485       1.42  (2014-07-09)
6486               don't define _CRT_SECURE_NO_WARNINGS (affects user code)
6487               fixes to stbi__cleanup_jpeg path
6488               added STBI_ASSERT to avoid requiring assert.h
6489       1.41  (2014-06-25)
6490               fix search&replace from 1.36 that messed up comments/error messages
6491       1.40  (2014-06-22)
6492               fix gcc struct-initialization warning
6493       1.39  (2014-06-15)
6494               fix to TGA optimization when req_comp != number of components in TGA;
6495               fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
6496               add support for BMP version 5 (more ignored fields)
6497       1.38  (2014-06-06)
6498               suppress MSVC warnings on integer casts truncating values
6499               fix accidental rename of 'skip' field of I/O
6500       1.37  (2014-06-04)
6501               remove duplicate typedef
6502       1.36  (2014-06-03)
6503               convert to header file single-file library
6504               if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
6505       1.35  (2014-05-27)
6506               various warnings
6507               fix broken STBI_SIMD path
6508               fix bug where stbi_load_from_file no longer left file pointer in correct place
6509               fix broken non-easy path for 32-bit BMP (possibly never used)
6510               TGA optimization by Arseny Kapoulkine
6511       1.34  (unknown)
6512               use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
6513       1.33  (2011-07-14)
6514               make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
6515       1.32  (2011-07-13)
6516               support for "info" function for all supported filetypes (SpartanJ)
6517       1.31  (2011-06-20)
6518               a few more leak fixes, bug in PNG handling (SpartanJ)
6519       1.30  (2011-06-11)
6520               added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
6521               removed deprecated format-specific test/load functions
6522               removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
6523               error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
6524               fix inefficiency in decoding 32-bit BMP (David Woo)
6525       1.29  (2010-08-16)
6526               various warning fixes from Aurelien Pocheville
6527       1.28  (2010-08-01)
6528               fix bug in GIF palette transparency (SpartanJ)
6529       1.27  (2010-08-01)
6530               cast-to-stbi_uc to fix warnings
6531       1.26  (2010-07-24)
6532               fix bug in file buffering for PNG reported by SpartanJ
6533       1.25  (2010-07-17)
6534               refix trans_data warning (Won Chun)
6535       1.24  (2010-07-12)
6536               perf improvements reading from files on platforms with lock-heavy fgetc()
6537               minor perf improvements for jpeg
6538               deprecated type-specific functions so we'll get feedback if they're needed
6539               attempt to fix trans_data warning (Won Chun)
6540       1.23    fixed bug in iPhone support
6541       1.22  (2010-07-10)
6542               removed image *writing* support
6543               stbi_info support from Jetro Lauha
6544               GIF support from Jean-Marc Lienher
6545               iPhone PNG-extensions from James Brown
6546               warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
6547       1.21    fix use of 'stbi_uc' in header (reported by jon blow)
6548       1.20    added support for Softimage PIC, by Tom Seddon
6549       1.19    bug in interlaced PNG corruption check (found by ryg)
6550       1.18  (2008-08-02)
6551               fix a threading bug (local mutable static)
6552       1.17    support interlaced PNG
6553       1.16    major bugfix - stbi__convert_format converted one too many pixels
6554       1.15    initialize some fields for thread safety
6555       1.14    fix threadsafe conversion bug
6556               header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
6557       1.13    threadsafe
6558       1.12    const qualifiers in the API
6559       1.11    Support installable IDCT, colorspace conversion routines
6560       1.10    Fixes for 64-bit (don't use "unsigned long")
6561               optimized upsampling by Fabian "ryg" Giesen
6562       1.09    Fix format-conversion for PSD code (bad global variables!)
6563       1.08    Thatcher Ulrich's PSD code integrated by Nicolas Schulz
6564       1.07    attempt to fix C++ warning/errors again
6565       1.06    attempt to fix C++ warning/errors again
6566       1.05    fix TGA loading to return correct *comp and use good luminance calc
6567       1.04    default float alpha is 1, not 255; use 'void *' for stbi_image_free
6568       1.03    bugfixes to STBI_NO_STDIO, STBI_NO_HDR
6569       1.02    support for (subset of) HDR files, float interface for preferred access to them
6570       1.01    fix bug: possible bug in handling right-side up bmps... not sure
6571               fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
6572       1.00    interface to zlib that skips zlib header
6573       0.99    correct handling of alpha in palette
6574       0.98    TGA loader by lonesock; dynamically add loaders (untested)
6575       0.97    jpeg errors on too large a file; also catch another malloc failure
6576       0.96    fix detection of invalid v value - particleman@mollyrocket forum
6577       0.95    during header scan, seek to markers in case of padding
6578       0.94    STBI_NO_STDIO to disable stdio usage; rename all #defines the same
6579       0.93    handle jpegtran output; verbose errors
6580       0.92    read 4,8,16,24,32-bit BMP files of several formats
6581       0.91    output 24-bit Windows 3.0 BMP files
6582       0.90    fix a few more warnings; bump version number to approach 1.0
6583       0.61    bugfixes due to Marc LeBlanc, Christopher Lloyd
6584       0.60    fix compiling as c++
6585       0.59    fix warnings: merge Dave Moore's -Wall fixes
6586       0.58    fix bug: zlib uncompressed mode len/nlen was wrong endian
6587       0.57    fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
6588       0.56    fix bug: zlib uncompressed mode len vs. nlen
6589       0.55    fix bug: restart_interval not initialized to 0
6590       0.54    allow NULL for 'int *comp'
6591       0.53    fix bug in png 3->4; speedup png decoding
6592       0.52    png handles req_comp=3,4 directly; minor cleanup; jpeg comments
6593       0.51    obey req_comp requests, 1-component jpegs return as 1-component,
6594               on 'test' only check type, not whether we support this variant
6595       0.50  (2006-11-19)
6596               first released version
6597 */
6598