Blog

LLMとComfyUIの連携(2)

4つのLLMを比較してみました

新しくLLMをインストールする

 画像からプロンプトを生成するLLMはllava-phi3 3.8Bとminicpm-v 8Bをインストールしていますが,新たにllama3.2-vision 11Bとbakllava 7Bをインストールしてみました。使用するLLMによってプロンプトや画像がどのように変化するか比較してみました。

生成時間の比較

 私が作成したワークフローだと,使用するLLMによって生成時間が変わります。元画像とモデルは同じものを使用して,最初の1枚と2枚目以降の生成時間を比較してみました。
llava-phi3 3.8B (最初の1枚)50秒程度,(2枚目以降)21~23秒程度
minicpm-v 8B (最初の1枚)1分程度,(2枚目以降)32~35秒程度
bakllava 7B(最初の1枚)50秒程度,(2枚目以降)27~28秒程度
上記3つのLLMの生成時間は多少ばらつきはありますが,ほぼ同じくらいでした。
llama3.2-vision 11Bは容量が大きいためか,動作が安定しませんでした。タスクマネージャーで確認したところ,専用GPUメモリの使用量が1.0GB程度のときに生成実行すると1枚2分前後で生成できますが,1.2GB以上のときに生成実行すると5分以上の時間がかかります。おそらくGPUのメモリ不足のため,CPUでプロンプト生成処理をしているので時間がかかるようです。その後の画像生成はGPUを使うために高速でした。
このLLMはGPUメモリ8GBという私の環境では使用が厳しいので,テスト後に削除しました。

プロンプトと画像の比較

 LLMの設定以外は同じにして画像を生成してみました。生成実行するたびにプロンプトを作り直して画像を生成するので参考程度にご覧ください。

元画像A

SD1.5で生成した画像です。元のモデルはchilled_remix_v2です。
今回使ったモデルはpinkiepiePonyMix_v36Fp16を使っています。

a.jpg

llava-phi3 3.8B

The image features a young woman in a professional setting. Here's a detailed description:

  • The central figure is a young woman with short, dark hair and bangs. She has an air of confidence and professionalism.
  • She is dressed in a black blazer over a crisp white blouse, paired with a contrasting brown skirt. A black bow tie adds a touch of formality to her outfit.
  • The woman stands against the backdrop of an office setting, which includes a desk and shelves. The exact contents of the shelves are not visible in the image.
  • Her pose suggests she is at ease yet attentive, fitting for an office environment.

Please note that this description is based on the information available in the image and does not include any assumptions or interpretations beyond what can be confidently determined from the image itself.

a1.jpg

minicpm-v 8B

A woman is pictured in a professional office setting, showcasing her fashionable attire. She wears a black and white top with suspenders paired with brown plaid trousers, creating an elegant look that combines sophistication and practicality for workplace environments. The background features typical office elements such as desks, papers on the wall, a laptop computer, desk lamps, a filing cabinet, and binders. These items emphasize her professional surroundings while highlighting important aspects of work life in today's modern offices.

a2.jpg

llama3.2-vision 11B

The image depicts a young woman in an office setting, likely a secretary or administrative assistant. She is dressed in business attire, consisting of a white blouse with a black bow at the neck and a brown cardigan over it. Her short brown hair frames her face, which appears to be blurred out.
In the background, there are desks and chairs scattered throughout the room, suggesting that this is an office environment where employees work together. The overall atmosphere seems professional and organized, with no signs of clutter or disarray.
The image may have been taken for a company's website or social media profile to showcase their staff members in a positive light. It could also be used as part of a recruitment campaign to attract new talent to the organization.

a3.jpg

bakllava 7B

The image features a beautiful girl with a pleasant smile, standing in an office setting. She is wearing a white shirt and a tie that has a bow on it. The girl seems to be the main focus of the scene.
In the background, there is a desk with various items on it, such as two keyboards, a mouse, and three books. A chair can also be seen in the office space, positioned close to the desk.

a4.jpg

 

b.jpg

元画像B

FLUX.1で生成した画像です。元のモデルはflux1DevPonyPVCTest_v10です。今回使ったモデルはwaiREALCN_v14を使っています。

b1.jpg

llava-phi3 3.8B

The image features a young woman as the main subject. She is standing in a bustling mall, surrounded by blurred figures of people moving around her. The woman is dressed in a chic white blouse with black sleeves and a matching black skirt, creating a harmonious outfit. Her hair is neatly styled into an elegant updo.
She stands confidently with her hands placed on her hips, facing the camera directly. A slight smile graces her face as she looks at the camera, giving off a friendly vibe. The precise location of objects and their relative positions are not discernible due to the bustling background.
Please note that this description is based on the available information and does not include any assumptions or interpretations beyond what can be confidently determined from the image.

b2.jpg

minicpm-v 8B

The image features a young woman standing in an indoor setting, likely a shopping mall or similar public space. She is dressed semi-formally with a crisp white blouse and navy blue pleated skirt, accessorized with delicate jewelry including earrings, a necklace, and what appears to be rings on her fingers. Her hair is styled neatly in a bun held together by a decorative barrette. The background is slightly blurred but shows other individuals moving about their day, suggesting the photo was taken in a busy area during regular business hours.

b3.jpg

llama3.2-vision 11B

The image depicts a young woman with long black hair, styled in a half-up, half-down ponytail, standing in a bustling shopping mall or airport. She is dressed in a white blouse featuring a V-neck and puffy sleeves, paired with a navy blue pleated skirt that falls just above her knees.

In the foreground, the woman's hands are clasped together in front of her, conveying a sense of calmness and composure. Her attire is complemented by a silver necklace adorned with a circular pendant, adding a touch of elegance to her overall appearance.

The background of the image is blurred, but it appears to be a busy public area, with several people walking in different directions. The atmosphere seems relaxed, with individuals going about their daily activities without appearing rushed or stressed.

Overall, the image presents a serene and peaceful scene, capturing a moment of tranquility amidst the hustle and bustle of everyday life.

b4.jpg

bakllava 7B

The scene features a mall with multiple people walking and standing around. In the foreground, a woman wearing a white shirt is posing for a picture while smiling. She appears to be in a shopping mall with several shops around her.

There are other people scattered throughout the mall. Some of them can be seen holding handbags or wearing backpacks. One person stands out as they are carrying a larger backpack near the center of the image. Overall, it is a bustling scene showcasing various individuals going about their day at the mall.

使用するLLMによる違いはないようでした。生成するプロンプトはllama3.2-vision 11Bが一番詳細で良さそうな感じです。次点はllava-phi3です。baklavaはあっさりしすぎている印象でした。

ローカルLLMを使って画像生成すると…

プロンプトを毎回生成し直すので,印象が異なる画像が出力されるのはいいと感じました。最近はJoycaptionを使うことが多かったので,ローカルLLMももっと活用しようと思います。

この記事へのコメント

コメントはまだありません。

コメントを送る

必須
必須  
※ メールアドレスは公開されません
任意
必須
Loading...  画像の文字を入力してください
9
10
11
12
13
14
15
16