The Raspberry Pi is probably the best gateway into IoT, embedded computing, and general programming, but for high-performance applications such as deep learning, self-driving cars, or computer vision, the Nvidia Jetson Nano steals the show. With a 128-core Nvidia Maxwell GPU, 2-4GB of LPDDR4-3200 memory on a 64-bit bus, and a quad-core Arm Cortex A57 CPU @ 1.43GHz, the Nvidia Jetson Nano has all the guts necessary to run more complicated projects, particularly in the GPU department. The Jetson Nano sports a set of Pi-compatible GPIO pins, so any GPIO code that runs on a Pi should work on a Jetson Nano. Unfortunately, the pins cannot supply nearly as much power as the Raspberry Pi's pins, although this is not a problem for most simple circuits. The Jetson Nano costs $100 for 4GB of RAM and $60 for 2GB. This price is a hefty increase over the Raspberry Pi line's $35 for 2GB, $55 for 4GB, and $75 for 8GB. That is, until you remember that you're getting roughly a Nintendo Switch in terms of performance. While theoretically the Raspberry Pi's quad-core 1.5GHz Arm Cortex A72 CPU is faster than the Jetson Nano's quad-core 1.43GHz Arm Cortex A57, but in my experience the Jetson Nano is no slower in regular operation, and can handle most basic computational tasks in addition to some AI-related workflows thanks to the GPU. The Nvidia's custom Ubuntu Unity 18.04 LTS OS is excellent for regular computing tasks and development alike, and would be considered by most as more visually appealing than the Raspberry Pi OS. The Jetson Nano performs admirably in low-power situations as well. It allows users to set it to either 5W mode of 10W mode (MAX-N). For reference, the Raspberry Pi 4B draws approximately 3W at idle, and has a power budget of 9W maximum. I suspect the Jetson Nano can pack so much muscle in such a power-limited board is thanks to the 20nm lithography used on the Jetson Nano's cut-down Tegra X1 chip. One big feature in terms of support that the Jetson Nano has that the Pi doesn't is JetBot, Nvidia's open-source project/initiative to make self-driving car technology accessible to students everywhere. There are multiple hardware platforms that can be made or purchased, and the custom OS image is downloadable for free. The JetBot ecosystem comes with a multitude of tutorials on how to get your robot to do various things, all accessed through Jupyter Notebooks. In short, the Jetson Nano is an excellent single-board computer that can serve as a full desktop replacement in addition to an artificial intelligence gateway computer. It has enough RAM, computing power, and community support to be useful to professionals, educators and students. My verdict: buy
0 Comments
When most people think of videogames, they either think of first-person shooters, anything from the Mario or Zelda franchise, or Minecraft. It's instinctive, I used to think like that. These are probably some of the most popular examples with the general public, but a completely different mold of game emerged near the turn of the millennium experience an explosion of popularity. Games that fit this category could be found in multiple genres, although they were by far the most numerous within the genere of RPGs (lit. Role-Playing Games). I'm not talking about an art style, or a new hardware platform, I am referring to games that transcended the bounds of regular games and have become a storytelling medium. Literature is a wonderful format for storytelling, but it has its weaknesses. Literature can be very engaging, but compared to film it lacks as it lacks visuals in addition to be more taxing on our senses, eyes in this case. Feature-length films are often filled with suspense and evoke strong emotions such as awe, fear, sadness, and amusement. Feature-length films are instead limited not by the ease of delivery but instead by the limit of length, which is 1-3 hours. Television series fix this by having long runtimes that are broken up into small segments, that you can ingest at your leisure. Everything I described so far has been objective, and while I cannot find any objective improvement over television series, I can put forth a situational weakness: you are a passive observer entirely removed from the narrative. Using videogames, a player can be put directly into the action, where their decisions have a direct impact on the progression of the story. In rare cases they can even change the final outcome! The main character isn't the one exploring the temple, you are. You are the one driving the car, you are the one fighting the bad guys, you are the one meeting a god, you are the one saving the world. In videogames, you are the center of the show. Videogames also tend to have longer playtimes than television shows, particularly shows made for streaming platforms as those tend to be much shorter. Playtimes can often get in excess of 50 sometimes 100 hours. While some of this time is spent bumbling around and wasting time, a well-designed game will draw players in so that they can't forward the plot fast enough. I know this feeling very well, this is what I felt like playing Xenoblade Chronicles 2 or Final Fantasy XV. Modern games have the additional advantage of being designed for powerful modern hardware, which allows numerous graphical effects to be generated in real-time that rival big studio CGI movies. Cutscenes can elevate this to a new level, as a prerendered cutscene is not bound by the player's hardware, allowing anyone to enjoy it. Even if you can't afford the best computer parts, you can still enjoy some games on a regular computer if you dial down the settings. Alternatively, you can buy a console for $2-500. Microsoft's Xbox, for example, comes in a $500 version (Xbox Series X) and a $300 version (Xbox Series S). Although the more expensive one is more powerful, it does not make too much of a difference as games are designed for the baseline system and additional improvements, while usually significant, are often not profound. Your returns diminish further when you compare a $500 Xbox Series X to a high end gaming PC, which will cost between $1200 and $2000. Yes, the PC version will look better, but the differences are usually mild.
Additionally, modern story-driven games are usually, but not always, designed so that you can complete the story regardless of your skill level. This is implemented through a variety of techniques such as level scaling, adjustable difficulty, strategically-placed predetermined encounters, or simply abolishment of the leveling system entirely. As is true for literature and film, narrative-driven videogames are not for everyone. Narrative-driven games are for people who want a masterful story. Some people think cartoons are only for children (they're dead wrong, watch Avatar). Similarly, some people dismiss videogames as childish, or say they'll never be any good so why bother. I believe, that the most technologically inept, even technophobic people can and will enjoy videogames if they have the the guts to jump in headfirst and give it an honest go. Yes, unless you want to play visually unappealing games you will likely need at least a basic console, which will out you $300 new or $1-200 for an old one, plus around $40-70 for a game. This entry barrier can be softened with services like PlayStation Plus, PlayStation Now, and Xbox Game Pass ($10-15). These are Netflix-style subscriptions that some way or another give you access to tons of games for a low monthly cost. Xbox All-Access takes this a step farther by giving you a console and Game Pass together for a monthly fee ($25-35). A comparison could be drawn to reading The Lord of the Rings for the first time. It is a very intimidating endeavor due to the size of the volumes as well as their advanced vocabulary, however once you read them you will be astounded by their excellence. What are you waiting for? Go on, try a game! If you need suggestions ask a gamer friend, look up best story-driven games, or check out my list of best games for players new and old. There's plenty to choose from! My first real encounter with embedded computers was all the way back in 2015 on a Raspberry Pi 2B, which packed a blazing quad-core 900MHz CPU and a whole gigabyte of LPDDR2 RAM. I remember booting up Raspbian Wheezy and playing games like "Squirrel eat Squirrel" and "Minecraft: Pi Edition". I also wrote my very first "Hello World!" program on that Pi. I used that Pi 2B to learn about programming fundamentals like variables, if-else statements, while loops, for loops, input statements, lists, input, and functions. Despite their simplicity, I found it thrilling to write these programs. I derived joy from having stupidly long for loops or naming my variables after memes.
I remember the CPU would get warm or sometimes even hot to the touch, so I put some heatsinks on it because I thought it'd run faster. I didn't know it was too simple to throttle and what I thought was hot was barely breaking a sweat. It was fascinating to see a whole computer on such a tiny board that was so cheap, I was blown away! Sure, it wasn't as powerful as a laptop or desktop, but Raspbian was so light and my programs were so simple it didn't matter. Since the original Raspberry Pi was so tied on RAM and had such a weak CPU, Chromium didn't come with Raspbian, but when Raspbian PIXEL came it was totally game-changing: Chromium was available on the Pi! The Pi was now a real desktop replacement, it was probably almost as gutsy as my Chromebook, which is to say not very. Nevertheless, it seemed magical to me. The Raspberry Pi was my gateway system, it helped me learn to program, it introduced me to embedded systems, and it taught me computers can be fun. Nowadays, Raspberry Pis are way more powerful than my old 2B and can come in much smaller sizes, but they haven't fundamentally changed. Besides, the old 2B is still great for applications with limited thermal headroom, every Pi has its place. This summer I'm going to make some firebending gloves, and to do that I'm going to use either a Raspberry Pi Zero or Raspberry Pi Pico to read an accelerometer and control a valve to release bursts of fuel. Both of these Pis are even weaker than my old 2B and pack way less memory, but that's not a problem. They're extremely cheap and have very low power consumption, which is why I'll use them. Kudos to the Raspberry Pi foundation for making coding and embedded tech accessible to anyone, my life wouldn't be the same if I hadn't first booted up my Raspberry Pi 2B. I think everyone should give a Raspberry Pi a shot, whether you want to animate a cool project, learn to program, build your own smart home, or even just give them a shot as a desktop computer, you never know what you'll find out! Look out for more Pi-related articles in the future. We're right in the middle of Nvidia's GPU Technology Conference, GTC, and I recently learned about something amazing: the RAPIDS software suite. RAPIDS is a set of GPU-accelerated Python libraries including CuPy, cuML, cuDF, Dask, and cuGraph, meant to replace the popular data science libraries Numpy, Scikit-Learn, Pandas, and Matplotlib. The RAPIDS libraries are mostly drop-in replacements, and support many of their competition's core function. RAPIDS is revolutionary because it uses Apache Arrow to limit the number of transfers between system and GPU memory, which is one of the primary weaknesses of GPU-acceleration due to the latency induced by the PCI-e bus.
I worked through some tutorials, and witnessed between 4x and 10,000x speedups on various tasks involving a million to a billion data points versus identical CPU-only operations. I already had experience using CuPy, but with a mere ten thousand data points I wasn't anywhere near parity with CPU code. What I didn't know is that with cuDF or Dask, you can mix datatypes and even process strings! Turing SMs are capable of processing INT32 at the same time as float32, and even some float64 with a massive performance hit if it must be done. This lets them take full advantage of RAPIDS mixed datatypes. RAPIDS even allows you to do quick and efficient K-means clustering as well as DBSCAN depending on whether or not you know how many clusters your data should fit into. Network topology problems? RAPIDS can chew 'em up and spit 'em out like nothing. the same underlying CUDA primitives can even be used to accelerate databases with BlazingSQL. You really can breath life into old hardware with only a GPU! Back in 2005, a new type of processor was prototyped with the goal of powering the most advanced console ever seen as well as the next-generation supercomputers: the Cell Broadband Engine. The Cell Broadband Engine, also known as the CBE or Cell, was a revolutionary design made by STI, an alliance formed by Sony, Toshiba, and IBM with the goal of building the processor of the future. Their design took a single high-performance IBM PowerPC core clocked at 3.2GHz with support for two simultaneous threads, and put it on the same die as eight special cores designed with a single goal: to execute numerical calculations as fast as possible. The result was the Cell Broadband Engine, a heterogeneous architecture processor that used a single CPU core, the Power Processing Element or PPE, for general purpose tasks as well as supplying data to the eight Synergistic Processing Elements, which provided supercomputer-level performance at raw number crunching.
The resulting chip was used to power the PlayStation 3, coupled with a respectable 256MB of blazing-fast XDR RAM as well as the Nvidia RSX "Reality Synthesizer". The other specs were impressive, but the Cell was the real star of the show. After the PlayStation 3 debuted on November 11, 2006, IBM set its sights on something bigger: making an improved Cell processor to power the next generation of servers. The resulting chip, the PowerXCell8i, was used in the IBM BladeCenter QS22 server as well as the IBM Roadrunner, the first petascale supercomputer. It looked like the future had arrived. Except, in 2013 the IBM Roadrunner was dismantled and the PlayStation 4 launched with a conventional x86_64 processor. No new developments after the PowerXCell8i were made to the Cell Broadband Engine. Why was this? My theory is based upon one of the PlayStation 3's shortcomings as well as some digging I did into IBM documentation: the Cell Broadband Engine created a positively garish experience for developers, and Nvidia's CUDA-enabled GPUs were faster and able to be integrated with existing systems. In addition to the daunting obstacles Cell's API posed to developers, Cell also received limited adoption and was only available in high-end servers and the RAM-tied PS3, neither of which was suitable for the masses. In this sense it was inevitable that CUDA eclipsed Cell, it was accessible to anyone with a computer and $350 to buy an Nvidia GeForce 8800 GT, which was almost a match for the Cell in raw throughput and had double the memory of the PS3. Since then CUDA took off and we never looked back. If you've been keeping up with my articles, you probably know that I think now is a terrrible time to build a PC. GPUs are in short supply and new consoles offer an unbeatable value proposition. However, if you just have to build a PC, then here is a guide on how to build something that will game at least on-par with a PS5 or Xbox Series X while not costing something outrageous. CPU - $280The AMD Ryzen 7 3700X is an octa-core processor has good single-core performance, excellent multi-core performance, and low power consumption. This will allow for stable 60fps when paired with a sufficiently powerful GPU for a given resolution. By using the same CPU as current-gen consoles, we can rest assured that we should not be CPU bound unless we are playing at low resolutions or with an extremely powerful GPU. Alternatively, a Ryzen 5 5600X is a $20 upgrade that nets you significantly improved single core performance and support for Smart Access Memory (read: minor performance boost) in exchange for being weaker overall, meaning that it may not age as well for gaming, and will not be as good at tasks such as building, video encoding, or video decoding. Most AMD Ryzen CPUs come with a decent stock cooler, so you don't have to worry about buying one. Motherboard - $150 For the motherboard, basically any B550 or X570 motherboard will work. The 3700X does not use too much power, so we don't have to worry about power delivery. Using a 500-series motherboard means you won't have to flash the bios to work with Ryzen 3000 CPUs. Additionally, you will have support for PCIe 4.0, which boasts extremely high bandwidth, enabling future GPUs to operate at maximum performance. RAM - $100Ryzen processors benefit from high-speed RAM, although there is no reason to go above DDR4-3600 due to limitations of AMD's Infinity Fabric. DDR4-3200 CL16 is the most common RAM out there, and offers good speeds a low costs. Some kits can run higher than their rated frequency, for example I run my DDR4-3200 CL16 kit at DDR4-3266 CL16, and it's rock solid. Do not get 8GB, and make sure to get dual-channel memory, as that doubles bandwidth. 16GB is enough, although if you have $50 to spare then I highly recommend upgrading to 32GB. GPU - $3-500GPU is where you have the most liberty to customize this build. The best choices include the GeForce RTX 3060, the GeForce RTX 3060 Ti, and the Radeon RX 6700XT. The two GeForce cards offer better performance in games that heavily utilize ray tracing. Additionally, they support DLSS, an Nvidia-exclusive technology which is employed by many games supporting ray tracing to offset the performance hit. The RTX 3060 Ti at $400 is more expensive but also much more powerful than the $330 RTX 3060, although it offers performance in the same league as the RTX 3070 for $100 less. The RX 6700XT is the most expensive card among those listed at $480, although tests show that it crushes the 3060 Ti and is often on par with the RTX 3070 in purely rasterized games. Another nice feature of the RX 6700XT is that it comes with 12GB of VRAM, which should aid with longevity. If you want to play the latest games in beautiful detail, go for the RTX 3060 Ti, or the RTX 3060 if money's tight. If, however, you want to play classics like The Witcher 3 or Final Fantasy XV in glorious 4K at 60fps, then the RX 6700XT will get you there. In short, if you like to play older games, or aren't that interested in ray tracing, you should get the RX 6700XT, otherwise go for the RTX 3060 or 3060 Ti. PSU - $100 Get a decent PSU, like the EVGA SuperNOVA 650 GT. You want around 650w for a GPU like the RX 6700XT, but you can get away with less for the RTX 3060 or RTX 3060 Ti. EVGA is very reliable for PSUs. You don't need to spend crazy money on a PSU, but you shouldn't cheap and buy a $40 unit. Case - $180The case is purely up to you, this is what you'll be looking at, so you don't want to get something hideous. I recommend the Corsair 760T, it has great cooling, looks sharp, and is easy to build in. The two drawbacks are that it is a little pricey, over $200 thanks to the Corsair tax, and doesn't come with tempered glass side panels. However, I've had no issues regarding rattling (aside from one fan that just needed oil), it comes with OK dust filters, and the thermals are excellent. Six drive bays is more than enough, but I've already used three of them plus an M.2 slot. Alternatively, if you want a cheaper case, the Lian Li Lancool II case comes with tempered glass side panels, metal on both the inside and outside, and has some fans for $100. The Lancool II only has 120mm fans instead of 140mm fans like the 760T and there is no room for a DVD drive. SSD - $100There are tons of SSDs out there for wildly varying prices, although you should expect to spend about $100 to get a decently sized SSD from a reputable manufacturer such as Crucial, Seagate, Inland, Western Digital, or even Samsung if you're lucky. You don't need to have the fastest SSD, even a DRAMless model will do if money's tight, but whatever you do don't get a mechanical hard drive. It isn't worth it, HDDs are slow, prone to failure, noisier, and use more power. If you need a ton of space on the cheap, get a tiny SSD and use a secondary HDD to store games, files, etc. Final price - ~$1300There you have it, a PC that balances cost and performance to deliver an excellent gaming experience. This base should serve you well whatever you want to do, whether it's esports or AAA gaming. No, you won't be able to max out everything at 4K and hit 60fps. No, you might not be able to pump out 500fps on some shooter. But, you will still be able to play at 4K if you finagle some settings, and you should be able to break 240fps playing Valorant, Minecraft, or even Fortnite if you turn down some settings. Consoles aren't often playing with max settings, but they still manage to make games look good. This PC will be in a similar situation. However, the monitor plays a significant role in making eye candy. I highly recommend a 1440p monitor with at least a 75Hz refresh rate. Happy gaming!
Nvidia RTX has been out for almost three years now. I've had my RTX 2080 Super for a year and a half, and now I'm going to look back on real-time ray tracing's origin story.
I jumped on the ray tracing bandwagon as a bit of an early adopter. Nvidia had introduced two new technologies, DLSS and RTX, and had released its Super cards, while AMD stuck to traditional rasterization but launched a brand-new CISC architecture, RDNA, dedicated to gaming, replacing the aging GCN architecture. I went for Team Green for the GPU, got a 2080 Super and paired it with a Ryzen 7 3800X. It was glorious to turn on ray tracing. Shadow of the Tomb Raider was one of the few games that supported ray tracing, it featured ray traced shadows and looked glorious. The details were beautiful, the lighting and shadows were lifelike, all in addition to phenomenal gameplay and setting. Minecraft showed a night and day difference, it looked like an entirely different game! The water actually refracted light so that it looked shallower than it was! Fast forward to the present day, and things are quite different. GeForce RTX GPUs have been on sale for almost three years, next-gen RTX GPUs launched, RDNA2 debuted with Ray Accelerators to support real-time ray tracing, and the PlayStation 5 and Xbox Series X|S support real-time ray tracing as well. Is ray tracing finally commonplace? Unfortunately the pandemic caused the gaming industry to explode at the same time as chip shortages due to pandemic restrictions, leading to supply severely outstripping demand. Do I think it was a good decision to buy a first-generation RTX GPU? Yes, I am so glad that I went with an RTX 20-Series GPU instead of a GTX 16-Series, GTX 10-Series, or RX 5700XT. RTX is a revolutionary technology, and in order to make it work, Nvidia had to make their cards tremendously powerful, add special RT Cores to accelerate ray tracing calculations, and on top of all that borrow Volta's Tensor Cores and create the DLSS algorithm to run on them. The RTX 20-Series GPUs are not perfect, ray tracing causes them to take a substantial performance hit, although it is definitely possible to enable at high resolutions and high settings on more powerful GPUs like RTX 2080, 2080 Super, and 2080 Ti. Even the RTX 2060 can ray trace at 4K with some settings turned down thanks to DLSS 2.0. That's right: the fight of the century! Not a fourth round of Ali-Frasier from beyond the grave, we're talking PC vs. Console! New GPUs and new consoles just dropped late last year, and now its starting to get to the point where you might be able to snag one of them soon. 6/6. PC: You can upgrade components in the futureMy RTX 2080 Super is powerful, it beats out the PS5 and goes toe-to-toe with the Xbox Series X. However, once developers start adding more visual effects such as more detailed ray tracing or hair models, you'll need more power to fully enjoy them. Unfortunately, these effects will have to be dialed back to run on old console hardware: Final Fantasy XV runs on the OG PS4 and Xbox One, but it also only runs at around 60-70 FPS at 1440p (or 4K with DLSS) with all the settings maxed out on my 2080 Super. This is how I know there must have been some cuts to make that game run on the Xbox One. Let's say my 2080 Super starts getting long in the tooth, I can swap it out for a newer card. You can't do that on a console 5/6. Console: GPUs alone cost more than a new console$400 buys you a PS5 digital edition. What GPU does that buy you? An RTX 3060 Ti if you can find one. Once you manage to get one of these, you still have to build a system around it. CPU, PSU, case, drives, motherboard, and now we're talking at least a full grand! Sure, the 3060 Ti is more powerful than the PS5, but not 2.5 times as powerful (probably not even 1.5 times). Save some money and buy a console. The Xbox Series S is even cheaper at $300. 4/6. PC: You get a PC, not just a gaming machineYou heard it your first: you do things other than gaming on your PC. WOW! Documents load quickly, quick boots, you can have lots of browser tabs running, bloatware doesn't affect you as much. These all come from generous amounts of RAM, SSDs, and powerful CPUs. Just because it's a gaming PC doesn't mean it isn't a fast work PC. Now you can comfortably abuse Chromium's tab feature (we know you do). 3/6. Console: Optimized gamesConsoles don't have Skype or Chrome running in the background. They don't have Windows Update chewing up bandwidth, disk, and RAM. Spec-wise, my PC should crush the PS5 and the Xbox Series X, thanks to the 2080 Super's Tensor Cores and superior RT Cores. However, I am not too sure this will be true for the Xbox, due its more streamlined software-hardware stack compared to Windows 10 gaming PCs. 2/6. PC: If you do any computationally-intensive programming, build immediatelyThis is the biggest surprise boon I got from my PC. It let me do AI work, heavy-duty numerical methods, and quickly build software from the source. I never could have done this stuff efficiently on the aging titan that was my Optiplex 960. For a machine built in 2008, 8GB of RAM and four cores is insane. For 2019, however, it was a little on the low end for what I wanted to do. Now I can tear through models that I wouldn't have dared run on my dinosaur. Not that they'd run without some finagling, Tensorflow requires the AVX instruction set. I admit, my example is quite extreme, AVX has been around since 2010. However, that doesn't mean the models would have run well, I multiplied my raw GPU power by approximately 35 times, and my CPU is about eight times as powerful in the best-case scenario. Single-core speeds are about 2-3 times better, which makes a massive difference when experimenting with software, as during experimentation you usually don't take the time to multi-thread. 1/6. Console: The Couch FactorYou can play video games on your couch. You can watch movies on your couch. Games, movies, on the couch. Games. Movies. Couch. Firing up a game as you sink into the cushions, hands wrapped around the controller, a device designed with a single purpose: comfortably interfacing with games. You put your feet up, and have a great time. Who would want to watch a movie on their computer? Ugh, so plebian. Real men of culture use the big screen for their motion-pictures. My Xbox 360 is still going strong in this employ (no red ring of death!) and does bang-up job playing Avatar: The Last Airbender through a 1080i projector (I know. I live in 2005).
There you have it, 3 reasons for and three against buying a console and putting off building a PC. As you can see, it depends on who you are. For programmers, a PC makes a lot of sense. For casual gamers, however, a console can't be beat in terms of convenience and the price/performance ratio. .At one point or another you may have found yourself wanting to fill in missing data points in a data set. This seems like a simple problem on the surface, but quickly gets complicated. The way to solve this problem is using interpolators. There are a variety of interpolators, but for one-dimensional data the simplest type is a polynomial interpolator. A polynomial interpolator works by taking a set of inputs and their corresponding outputs, and then generates a polynomial through them with the aim of inferring data between the points. The resulting polynomial is of order n-1, where n is the number of data points. The result always goes through the input data points, although it will not always produce a smooth or logical transition between them. This is because the "smooth transition" we are looking for is a construct made by our intuition, something a computer does not have. Here is an implementation of polynomial interpolation using Python 3 and the CuPy library. CuPy is a CUDA-accelerated subset of the Numpy library, aiding massively parallel applications. import cupy as cp #import numpy as cp # Uncomment the above line if you do not have CuPy from cp import float32 from mpl_toolkits.mplot3d import Axes3D from matplotlib import cm import matplotlib.image as mpimg import matplotlib.pyplot as plt import matplotlib from pylab import * def Ln(x_knowns, n, x): # Our inner loop of the interpolator x_subset = cp.copy(x_knowns) temp0 = cp.subtract(x, x_subset) xn = x_knowns[n] temp1 = cp.subtract(xn, x_subset) temp1[n] = 1.0 temp2 = cp.divide(temp0, temp1) temp2[n] = 1.0 temp3 = cp.prod(temp2) return(temp3) def interpolator(x_knowns, y_knowns, x): # The outer loop aggregate = 0 for i in range(0, y_knowns.size): aggregate += y_knowns[i] * Ln(x_knowns, i, x) #print(aggregate) return aggregate def array_function(x): # Our function to generate test data, e^-(x^2) temp = cp.copy(x) temp = cp.square(temp) temp = cp.multiply(-1.0, temp) temp = cp.exp(temp) return temp # Generate test domain using linspace, 10 points mean a polynomial of order 9 x_test = cp.linspace(0, 3.1416*2, num=10, endpoint=True, dtype="float32") val_range = cp.amax(x_test) - cp.amin(x_test) val_min = cp.amin(x_test) # Fill data for plotting in between the test points, purely for visualization # Does not affect the polynomial itself x_fill = cp.linspace(val_min-1, val_min+val_range+2, num=infill_resolution, endpoint=True, dtype="float32") y_fill = [] # This is the y data used to define the interpolation polynomial y_test = array_function(x_test) # Runs the interpolator using the test data to produce higher resolution fill data for graphing purposes for i in range(0, infill_resolution): y_fill.append(interpolator(x_test, y_test, x_fill[i])) # Plots the data points interpolated about plt.scatter(x_test.get(), y_test.get()) # Plots the real function plt.plot(x_fill.get(), array_function(x_fill).get()) # Plots our interpolated guess plt.plot(x_fill.get(), y_fill) # Shows the plot plt.show() The code should produce the graph above. If you do not have a CUDA-enabled GPU, you can use Numpy instead by commenting out the first line and uncommenting the second line.
The blue line represents the real function, while the orange is the polynomial produce by interpolation. Notice that the lines match quite well between the known data points, but becomes unstable very quickly. Projecting beyond the data is actually not interpolation and is something called extrapolation, for which there are other, more effective methods than the polynomial method. Ten data points were used. The resulting polynomial of the 9th order, which is quite high. These high exponents cause instability because they grow quickly out of control. I remember being told in AP Computer Science never to use single-precision floats and always use doubles, as computers are so fast nowadays with so much memory it doesn't matter. That seems to be the general advice, the same dogma came up in a conversation about numerical methods I had just a few hours before writing this article. If you'd asked me a few months ago I would have said single precision might not be a bad idea and half precision even more so. Well, I now say only half of that previous statement is true, and it's not for the reasons I might have guessed. For most things, save this specific example of numerical methods, single precision is plenty, half precision would even probably be enough in cases like AI. The catch is my numerical methods so far run on a CPU and are 90% sequential. Why is that a problem? Shouldn't half precision still be lighter than full precision, there are less bits?
Yes! But also no. The issue lies in the x86 architecture that modern x86_64/AMD64 processors are extended from. x86 is a 32-bit architecture, and can have trouble sometimes when working with 16-bit floats. 64-bit extensions are also why double precision doesn't take much of a hit on CPUs. Moving over to GPUs, and things get a little messier. An Nvidia Tesla P100 has 9.526 teraflops of float32 compute performance. A GTX 1080 Ti has 11.34 teraflops of float32. My RTX 2080 Super has 11.2 teraflops of float32 performance. That means my GPU is better than a P100 and about equal to a 1080 Ti, right? Yes and no! Let's look at double precision, see if that sheds any light. My 2080 Super and the 1080 Ti both support approximately 350 gigaflops of float64 performance, a 1:32 performance slash. The P100? It runs a cool 4.7 teraflops, half its single-precision spec. What's going on? Modern flagship Tesla GPUs have special FP64 cores that boost their FP64 speeds. Gaming cards lack these, and as such the two GeForce cards pale in comparison. Now let's see how float16 goes down. The P100 scores 19.05 teraflops, and my RTX 2080 Super gets 22.5 teraflops, and the 1080 Ti gets ... 177 gigaflops. What? The first two doubled but the 1080 Ti dropped at a ratio of 1:64, even worse than double precision's hit! The 1080 Ti lacks a way to process float16, and has to do some voodoo to get it to work at all. The P100 is designed to process half-precision, and the 2080 Super has Tensor Cores, something new with the Volta and Turing architectures that improve upon basic half-precision cores. Things aren't so black and white now that you look at them. A bit of trivia/bonus to any early adopters of RTX out there. Turing has one more trick up its sleeve: simultaneous INT32 and FP32 processing. Each Turing streaming multiprocessor (SM) supports concurrent integer and floating-point math, something removed in Ampere in exchange for a mix of floating-point cores and dual-purpose cores. That's another reason why the 2080 Super and even the basic 2080 can beat the 1080 Ti. It's not always as simple as it seems, but in our modern era basic intuition proves correct: less bits = more speed! |
DanielI'm a software engineer, volunteer IT support, amateur blogger, casual gamer, and tech enthusiast. I also love cars and the great outdoors. Archives
May 2021
Categories |