Questions tagged [cuda]
CUDA is a parallel computing platform and programming model for Nvidia GPUs (Graphics Processing Units). CUDA provides an interface to Nvidia GPUs through a variety of programming languages, libraries, and APIs.
55 questions
2
votes
1
answer
92
views
RAII Wrapper For CUDA Pointers
I was recently working on my CUDA wrappers library, and this particular class is one of the oldest pieces of code in the entire project. Since that time, I added tons of other features (for example <...
12
votes
1
answer
799
views
Strongly-typed CUDA device memory
When I discovered that CUDA device memory was represented by plain old void* I was horrified by having to deal with C-style type safety and resource ownership (i.e. ...
7
votes
1
answer
265
views
RAII Wrapper For Registering/Mapping CUDA Resources
I've implemented a resource management class for CUDA interop using RAII to ensure exception safety. The goal is to handle the registration/unregistration and mapping/unmapping, of graphics resources (...
1
vote
0
answers
95
views
Sphere Generation System With CUDA-OpenGL Interop
This is some kind of follow up to my previous question, this question will be more focused on the actual tessellating pipeline.
What I changed from previous question
Implemented the async sphere ...
1
vote
0
answers
67
views
CUDA Sphere Tesselation With Support For LOD
I was working on my version of "Universe Sandbox" and first thought comes to your mind is "where the hell are my planets?" so I thought loading models sucks and made this thing, It'...
8
votes
1
answer
291
views
CUDA/NVRTC context switching function
I've implemented a feature in my C++ fractal explorer application to switch between CUDA and NVRTC. The main reason for the NVRTC/Driver API context is to support runtime compilation of custom CUDA ...
15
votes
1
answer
2k
views
CUDA Mandelbrot Kernel
I'm looking for feedback and suggestions on improving the performance and quality of my CUDA kernel for rendering the Mandelbrot set. I've implemented a "ping-pong" style coloring and ...
3
votes
1
answer
103
views
Tracking total iterations in CUDA fractal renderer
I'm developing a fractal renderer in CUDA and need advice on tracking the total number of iterations performed during rendering. This is important for real-time dragging and zooming performance.
...
6
votes
0
answers
169
views
FractalRendering on GPU with CUDA
I am doing a fractal renderer using CUDA, SFML, C++, recently optimized it to eat less memory, now I am going to optimize the actual fractals, because for some reason, it is the most holding back ...
2
votes
1
answer
85
views
I have a pytorch module that takes in some parameters and predicts the difference between one of it inputs and the target
One instance of the following module uses up to almost 75% of my vram. So, I was wondering how I could improve that without slowing down runtime too much. The code is below:
...
3
votes
1
answer
129
views
Pytorch code running slow for Deep Q learning (Reinforcement Learning)
I'm a new student in reinforcement learning. Below is the code that I wrote for deep Q learning:
...
1
vote
0
answers
252
views
A CUDA kernel for a matrix product as outer product vectors
To multiply the matrices A and B using the outer product of vectors, we can express each row of matrix A as a row vector and each column of matrix B as a column vector. Then, we can take the outer ...
2
votes
1
answer
173
views
Applying cointegration function from statsmodels on a large dataframe
I need to apply the coint function from the statsmodels library to 207 times series with 1397 points each, two by two.
Currently, it takes between 35-40 minutes on my computer with an Intel 24 Cores ...
5
votes
3
answers
237
views
Summation over different determinants that are independently computed using CUDA
Do you have any suggestions for improving the efficiency of the code below?
I believe that better optimization can be implemented in the GPU function cuKer_sum, which is located in the ...
5
votes
1
answer
223
views
CUDA kernel to compare pairs of matrices
My first time writing anything significant in CUDA.
This kernel takes two arrays representing square matrices and compares them pair-wise. It takes into consideration large input arrays, and ...